$μ$-Parametrization for Mixture of Experts Paper • 2508.09752 • Published 3 days ago • 6 • 2
view article Article How to train a Language Model with Megatron-LM By loubnabnl • Sep 7, 2022 • 17
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • 5 days ago • 50
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated 4 days ago • 27
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • 12 days ago • 458
view article Article retrain-pipelines and the almighty function-caller By Aurelien-Morgan • Apr 28 • 8
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Paper • 2507.19427 • Published 22 days ago • 18 • 2
view article Article Introducing Command A Vision: Multimodal AI built for Business By CohereLabs and 3 others • 16 days ago • 62