|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HuggingFaceFW/fineweb-2 |
|
language: |
|
- is |
|
base_model: |
|
- HuggingFaceTB/SmolLM2-135M-Instruct |
|
pipeline_tag: text-generation |
|
--- |
|
This is a SmolLM2-135M-Instruct model fine-tuned on the Icelandic portion of Fineweb-2. It is intended for my research and has not been evaluated more broadly yet. |
|
|
|
Training: |
|
- 1 Epoch |
|
- Learning rate: 5e-4 |
|
- LR scheduler: Cosine |
|
- Warmup ratio: 0.05 |
|
- Batch size: 1 |
|
- 4 A100 (40GB) GPUs |
|
- Gradient accumulation steps: 64 |
|
- Effective batch size: 256 |
|
- Max. context length: 8192 tokens |