Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- EleutherAI/the_pile_deduplicated
|
4 |
+
pipeline_tag: text-generation
|
5 |
+
library_name: transformers
|
6 |
+
---
|
7 |
+
|
8 |
+
# Pythia 1.4b Deduped with 8k Context Window
|
9 |
+
|
10 |
+
This model fine-tunes Pythia 1.4b model with a context window of 8k tokens. With optimizations like Flash Attention & bitsandbytes, I could fit the model the entire model with a batch size of 1, on a single A100 (40 GB). The fine-tuning took ~30 hours, after which the loss was similar to that of fine-tuning at the context window of 2k tokens.
|