Physical AI Collection Collection of commercial-grade datasets for physical AI developers • 10 items • Updated 27 minutes ago • 25
DRAMA Collection A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages. • 3 items • Updated 28 days ago • 4
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions Paper • 2502.13124 • Published Feb 18 • 5
OpenR1-Math Collection Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co/blog/open-r1/update-2 • 3 items • Updated 15 days ago • 7
Llasa Collection TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 11 items • Updated Feb 21 • 15
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 4 items • Updated 7 days ago • 101
VideoLLaMA3 Collection Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated 15 days ago • 14
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated 1 day ago • 56
Breeze 2 Family Collection Llama-Breeze2 is a multi-modal language model family specifically intended for Traditional Chinese use. BreezyVoice is a Taiwan Mandarin TTS • 6 items • Updated 29 days ago • 18
CritiqueFineTuning Collection The dataset and models for CritiqueFineTuning • 4 items • Updated Feb 2 • 2
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 57
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 32