Shifting Long-Context LLMs Research from Input to Output Paper • 2503.04723 • Published 12 days ago • 19
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 8 days ago • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 8 days ago • 77
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published 11 days ago • 7
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published 15 days ago • 27
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 12 days ago • 17
PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference Paper • 2502.13502 • Published 28 days ago • 3
Liger: Linearizing Large Language Models to Gated Recurrent Structures Paper • 2503.01496 • Published 15 days ago • 15
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published 16 days ago • 31
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 25 days ago • 13