Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published 17 days ago β’ 62
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper β’ 2411.14405 β’ Published Nov 21 β’ 58
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training Paper β’ 2309.17179 β’ Published Sep 29, 2023 β’ 2
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model Paper β’ 2410.13639 β’ Published Oct 17 β’ 16
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper β’ 2411.16489 β’ Published Nov 25 β’ 40
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper β’ 2410.02884 β’ Published Oct 3 β’ 52
Tree of Problems: Improving structured problem solving with compositionality Paper β’ 2410.06634 β’ Published Oct 9 β’ 8
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper β’ 2407.21787 β’ Published Jul 31 β’ 12
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper β’ 2408.03314 β’ Published Aug 6 β’ 51
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper β’ 2412.16145 β’ Published 6 days ago β’ 33
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning Paper β’ 2411.07279 β’ Published Nov 11 β’ 3
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs Paper β’ 2410.18451 β’ Published Oct 24 β’ 15
Generative Verifiers: Reward Modeling as Next-Token Prediction Paper β’ 2408.15240 β’ Published Aug 27 β’ 13
Understanding Hidden Computations in Chain-of-Thought Reasoning Paper β’ 2412.04537 β’ Published 21 days ago
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper β’ 2412.17256 β’ Published 4 days ago β’ 36
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning Paper β’ 2410.02089 β’ Published Oct 2 β’ 12