--- license: apache-2.0 datasets: - HuggingFaceFW/fineweb-2 language: - is pipeline_tag: text-generation library_name: transformers --- This is a model with the same specifications as SmolLM2-135M trained from scratch on the Icelandic portion of Fineweb-2. It is intended as a baseline for my research is probably rather bad for most purposes :) Training: - 1 Epoch - Learning rate: 5e-4 - LR scheduler: Cosine - Warmup ratio: 0.05 - Batch size: 1 - 4 A100 (40GB) GPUs - Gradient accumulation steps: 64 - Effective batch size: 256 - Max. context length: 8192 tokens