VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published Jan 16 • 27
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 77
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 55
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 354
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 143
Large Concept Models: Language Modeling in a Sentence Representation Space Paper • 2412.08821 • Published Dec 11, 2024 • 14
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Paper • 1901.02860 • Published Jan 9, 2019 • 3
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10, 2024 • 108
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 46
O1 Replication Journey: A Strategic Progress Report -- Part 1 Paper • 2410.18982 • Published Oct 8, 2024 • 3
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts Paper • 1909.13231 • Published Sep 29, 2019 • 1
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning Paper • 2411.07279 • Published Nov 11, 2024 • 3
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models Paper • 2410.11081 • Published Oct 14, 2024 • 19