Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 7 days ago • 33
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7 • 13
Spurious Feature Diversification Improves Out-of-distribution Generalization Paper • 2309.17230 • Published Sep 29, 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint Paper • 2312.11456 • Published Dec 18, 2023 • 1
Weakly Supervised Disentangled Generative Causal Representation Learning Paper • 2010.02637 • Published Oct 6, 2020
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models Paper • 2306.12420 • Published Jun 21, 2023 • 2
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment Paper • 2304.06767 • Published Apr 13, 2023 • 2