MrGonao
/

Mamba-Tiny-Stories-Pre

Model card Files Files and versions Community

MrGonao commited on Jan 13, 2024

Commit

d9f0863

·

verified ·

1 Parent(s): 764811a

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -1,9 +1,20 @@
 ---
 license: apache-2.0
 ---
 models are in models/
 names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)
 inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)
 tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one
 train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.
 model.py is where the MambaLMHeadModel class is defined.

 ---
 license: apache-2.0
+datasets:
+- roneneldan/TinyStories
+language:
+- en
 ---
 models are in models/
 names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)
 inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)
 tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one
 train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.
 model.py is where the MambaLMHeadModel class is defined.
+context lenght is 256