cleanrl (cleanrl)

ArashAhmadian

authored a paper 5 months ago

Command A: An Enterprise-Ready Large Language Model

Paper • 2504.00698 • Published Apr 1 • 27

vwxyzjn

authored 5 papers 7 months ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 6

ArashAhmadian

authored a paper 8 months ago

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Paper • 2412.04144 • Published Dec 5, 2024 • 5

ArashAhmadian

authored 3 papers about 1 year ago

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

Paper • 2407.02552 • Published Jul 2, 2024 • 4

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Paper • 2309.05444 • Published Sep 11, 2023 • 1

Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3, 2024 • 20

vwxyzjn

updated 3 models about 1 year ago

cleanrl/EleutherAI_pythia-6.9b-dedupedppotldr

Text Generation • 7B • Updated May 30, 2024 • 4

cleanrl/EleutherAI_pythia-2.8b-dedupedppotldr

Text Generation • 3B • Updated May 30, 2024 • 7

cleanrl/EleutherAI_pythia-1b-dedupedppotldr

Text Generation • 1B • Updated May 30, 2024 • 8

ArashAhmadian

authored a paper about 1 year ago

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22, 2024 • 13

vwxyzjn

updated 6 models over 1 year ago

cleanrl/EleutherAI_pythia-2.8b-dedupedrewardtldr

Text Classification • Updated May 15, 2024 • 86

cleanrl/EleutherAI_pythia-1b-dedupedrewardtldr

Text Classification • Updated May 15, 2024 • 2.57k

cleanrl/EleutherAI_pythia-1b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 2.98k

cleanrl/EleutherAI_pythia-2.8b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 957

cleanrl/EleutherAI_pythia-6.9b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 7

cleanrl/EleutherAI_pythia-6.9b-dedupedrewardtldr

Text Classification • Updated May 7, 2024 • 20

AI & ML interests

Team members 6

cleanrl's activity