TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 58
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15, 2024 • 72
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization Paper • 2411.06208 • Published Nov 9, 2024 • 19
LOGO -- Long cOntext aliGnment via efficient preference Optimization Paper • 2410.18533 • Published Oct 24, 2024 • 42
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 136
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4, 2024 • 72
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate Jun 13, 2024 • 45
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 124
Course-Correction: Safety Alignment Using Synthetic Preferences Paper • 2407.16637 • Published Jul 23, 2024 • 26
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 70
Towards Building Specialized Generalist AI with System 1 and System 2 Fusion Paper • 2407.08642 • Published Jul 11, 2024 • 9