NILE: Internal Consistency Alignment in Large Language Models Paper • 2412.16686 • Published Dec 21, 2024 • 8
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 39
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling Paper • 2412.15084 • Published Dec 19, 2024 • 13
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 76
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published Mar 20 • 51
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Paper • 2504.16891 • Published Apr 23 • 24
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published Apr 22 • 20
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published Apr 15 • 61
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 13
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale Paper • 2505.08311 • Published May 13 • 18
ATLAS: Learning to Optimally Memorize the Context at Test Time Paper • 2505.23735 • Published May 29 • 23
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning Paper • 2505.20561 • Published May 26 • 7
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published 4 days ago • 21