jekunz
/

smollm-135m-fineweb-icelandic-from-scratch

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jekunz commited on 10 days ago

Commit

955082b

·

verified ·

1 Parent(s): 686b044

Create README.md

Files changed (1) hide show

README.md +22 -0

README.md ADDED Viewed

	@@ -0,0 +1,22 @@

+---
+license: apache-2.0
+datasets:
+- HuggingFaceFW/fineweb-2
+language:
+- is
+pipeline_tag: text-generation
+library_name: transformers
+---
+This is a model with the same specifications as SmolLM2-135M trained from scratch on the Icelandic portion of Fineweb-2. It is intended as a baseline for my research is probably rather bad for most purposes :)
+Training:
+- 1 Epoch
+- Learning rate: 5e-4
+- LR scheduler: Cosine
+- Warmup ratio: 0.05
+- Batch size: 1
+- 4 A100 (40GB) GPUs
+- Gradient accumulation steps: 64
+- Effective batch size: 256
+- Max. context length: 8192 tokens