Pretrain Data HuggingFaceTB/smollm-corpus Viewer • Updated Sep 6, 2024 • 237M • 14.1k • 323 HuggingFaceFW/fineweb-edu-classifier Text Classification • Updated Nov 17, 2024 • 274k • • 172 HuggingFaceFW/fineweb Viewer • Updated Jan 31 • 25B • 190k • 2.1k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 4.21k • 359