RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published 2 days ago • 107
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 8 days ago • 58
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published 13 days ago • 74
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 15 days ago • 81
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 17 days ago • 75
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published 19 days ago • 54
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 28 days ago • 182
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published Feb 13 • 33
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 143
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 124
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 208
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 109
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 55
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 94