small-scale pretraining experiments of mine
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation • 0.1B • Updated • 3.49k • 28 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation • 0.2B • Updated • 4.14k • 13 -
BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
Text Generation • 0.2B • Updated • 7 • 1 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation • 0.1B • Updated • 1.86k • 6