naxautify
/

gpt2-4k

naxalpha commited on Apr 4, 2023

Commit

fca7629

1 Parent(s): 113af5d

add more info

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,4 +1,6 @@
 # GPT-2 (125M) 4k tokens
 Fine-tuned GPT2 Smallest model on The Pile with a token length of 4k.
-Weights are included and it follows Karpathy's nanoGPT implementation.

 # GPT-2 (125M) 4k tokens
 Fine-tuned GPT2 Smallest model on The Pile with a token length of 4k.
+Weights are included and it follows Karpathy's nanoGPT implementation.
+The model has been trained for ~1 million iterations with increasing batch size, ending at 32k.
+The final loss is 3.9 which is probably due to 768 embedding size.