jekunz commited on
Commit
955082b
·
verified ·
1 Parent(s): 686b044

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - HuggingFaceFW/fineweb-2
5
+ language:
6
+ - is
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ ---
10
+
11
+ This is a model with the same specifications as SmolLM2-135M trained from scratch on the Icelandic portion of Fineweb-2. It is intended as a baseline for my research is probably rather bad for most purposes :)
12
+
13
+ Training:
14
+ - 1 Epoch
15
+ - Learning rate: 5e-4
16
+ - LR scheduler: Cosine
17
+ - Warmup ratio: 0.05
18
+ - Batch size: 1
19
+ - 4 A100 (40GB) GPUs
20
+ - Gradient accumulation steps: 64
21
+ - Effective batch size: 256
22
+ - Max. context length: 8192 tokens