naxautify commited on
Commit
d5931ec
·
1 Parent(s): 7b76417

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - EleutherAI/the_pile_deduplicated
4
+ pipeline_tag: text-generation
5
+ library_name: transformers
6
+ ---
7
+
8
+ # Pythia 1.4b Deduped with 8k Context Window
9
+
10
+ This model fine-tunes Pythia 1.4b model with a context window of 8k tokens. With optimizations like Flash Attention & bitsandbytes, I could fit the model the entire model with a batch size of 1, on a single A100 (40 GB). The fine-tuning took ~30 hours, after which the loss was similar to that of fine-tuning at the context window of 2k tokens.