$μ$-Parametrization for Mixture of Experts Paper • 2508.09752 • Published 2 days ago • 5 • 2
HuggingFaceTB/SmolLM3-3B-Base Text Generation • 3B • Updated about 15 hours ago • 10.4k • 116
view article Article How to train a Language Model with Megatron-LM By loubnabnl • Sep 7, 2022 • 17
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • 4 days ago • 42
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated 2 days ago • 25
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • 10 days ago • 457
view article Article retrain-pipelines and the almighty function-caller By Aurelien-Morgan • Apr 28 • 8
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Paper • 2507.19427 • Published 21 days ago • 18 • 2
view article Article Introducing Command A Vision: Multimodal AI built for Business By CohereLabs and 3 others • 15 days ago • 62