RLHFlow

university

RLHFlow

RLHFlow

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Collections 11

View 11 collections

models 29

RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp

Text Generation • 8B • Updated May 21 • 7 • 1

RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej

Text Generation • 8B • Updated May 21 • 4 • 1

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

Text Generation • 8B • Updated May 10 • 3.1k • • 36

RLHFlow/Qwen2.5-7B-SFT

8B • Updated Feb 17 • 4

RLHFlow/Qwen2.5-7B-RAFT-Zero

8B • Updated Feb 17 • 4

RLHFlow/Qwen2.5-7B-DPO-NLL-Zero

8B • Updated Feb 17 • 3

RLHFlow/Qwen2.5-7B-DPO-Zero

8B • Updated Feb 17 • 3

RLHFlow/Qwen2.5-7B-DPO

8B • Updated Feb 17 • 475

RLHFlow/Qwen2.5-7B-PPO-Zero

8B • Updated Feb 17 • 21 • 2

RLHFlow/Decision-Tree-Reward-Gemma-2-27B

Text Classification • 27B • Updated Jan 24 • 58 • 7

datasets 83

RLHFlow/self_rewarding_turn2_example

Updated Mar 2 • 4

RLHFlow/self_rewarding_turn1_with_rewards_example

Updated Mar 2 • 6

RLHFlow/self_rewarding_rl_prompt

Updated Mar 2 • 2

RLHFlow/self_rewarding_sft_prompt

Viewer • Updated Mar 2 • 40k • 8

RLHFlow/self_rewarding_ift_example_raw_data1

Viewer • Updated Feb 26 • 16.3k • 2

RLHFlow/self_rewarding_ift_example

Viewer • Updated Feb 26 • 32k • 11

RLHFlow/qwq_gen_sft_15k

Viewer • Updated Feb 17 • 15k • 19

RLHFlow/numia_prompt_ppo

Viewer • Updated Feb 13 • 404k • 7 • 1

RLHFlow/numia_prompt_dpo_test

Viewer • Updated Feb 11 • 1.02k • 6

RLHFlow/numia_prompt_dpo9

Viewer • Updated Feb 11 • 20k • 1

View 83 datasets