MrGonao commited on
Commit
d9f0863
·
verified ·
1 Parent(s): 764811a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -1,9 +1,20 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
3
  ---
4
  models are in models/
 
5
  names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)
 
6
  inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)
 
7
  tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one
 
8
  train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.
 
9
  model.py is where the MambaLMHeadModel class is defined.
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - roneneldan/TinyStories
5
+ language:
6
+ - en
7
  ---
8
  models are in models/
9
+
10
  names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)
11
+
12
  inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)
13
+
14
  tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one
15
+
16
  train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.
17
+
18
  model.py is where the MambaLMHeadModel class is defined.
19
+
20
+ context lenght is 256