-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 136 -
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models
Paper • 2409.18943 • Published • 28 -
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Paper • 2411.16594 • Published • 37 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 37
Yuan
MinakamiYuki
AI & ML interests
None yet
Recent Activity
updated
a collection
9 days ago
LLM paper
updated
a collection
15 days ago
LLM paper
liked
a dataset
about 1 month ago
ManTle/mops
Organizations
None yet
Collections
1
models
None public yet
datasets
None public yet