Update README.md
Browse files
README.md
CHANGED
@@ -1,9 +1,20 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
models are in models/
|
|
|
5 |
names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)
|
|
|
6 |
inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)
|
|
|
7 |
tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one
|
|
|
8 |
train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.
|
|
|
9 |
model.py is where the MambaLMHeadModel class is defined.
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- roneneldan/TinyStories
|
5 |
+
language:
|
6 |
+
- en
|
7 |
---
|
8 |
models are in models/
|
9 |
+
|
10 |
names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)
|
11 |
+
|
12 |
inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)
|
13 |
+
|
14 |
tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one
|
15 |
+
|
16 |
train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.
|
17 |
+
|
18 |
model.py is where the MambaLMHeadModel class is defined.
|
19 |
+
|
20 |
+
context lenght is 256
|