Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 5 days ago • 129
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published 10 days ago • 49
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 11 days ago • 123
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published 17 days ago • 23
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 10 days ago • 27
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators Paper • 2502.06394 • Published 11 days ago • 85
Competitive Programming with Large Reasoning Models Paper • 2502.06807 • Published 18 days ago • 60
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 97
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6, 2024 • 73