naxautify
/

pythia-1.4b-deduped-8k

Text Generation

text-generation-inference

Model card Files Files and versions Community

pythia-1.4b-deduped-8k / README.md

naxautify's picture

Create README.md

d5931ec almost 2 years ago

|

history blame contribute delete

501 Bytes

	---
	datasets:
	- EleutherAI/the_pile_deduplicated
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Pythia 1.4b Deduped with 8k Context Window

	This model fine-tunes Pythia 1.4b model with a context window of 8k tokens. With optimizations like Flash Attention & bitsandbytes, I could fit the model the entire model with a batch size of 1, on a single A100 (40 GB). The fine-tuning took ~30 hours, after which the loss was similar to that of fine-tuning at the context window of 2k tokens.