naxalpha commited on
Commit
fca7629
1 Parent(s): 113af5d

add more info

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -1,4 +1,6 @@
1
  # GPT-2 (125M) 4k tokens
2
 
3
  Fine-tuned GPT2 Smallest model on The Pile with a token length of 4k.
4
- Weights are included and it follows Karpathy's nanoGPT implementation.
 
 
 
1
  # GPT-2 (125M) 4k tokens
2
 
3
  Fine-tuned GPT2 Smallest model on The Pile with a token length of 4k.
4
+ Weights are included and it follows Karpathy's nanoGPT implementation.
5
+ The model has been trained for ~1 million iterations with increasing batch size, ending at 32k.
6
+ The final loss is 3.9 which is probably due to 768 embedding size.