This is the collection of the online-DPO-R1 project.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
View all activity
Collections
10
models
27

RLHFlow/Qwen2.5-7B-SFT
Updated
•
3

RLHFlow/Qwen2.5-7B-RAFT-Zero
Updated
•
34

RLHFlow/Qwen2.5-7B-DPO-NLL-Zero
Updated
•
20

RLHFlow/Qwen2.5-7B-DPO-Zero
Updated
•
28

RLHFlow/Qwen2.5-7B-DPO
Updated
•
12

RLHFlow/Qwen2.5-7B-PPO-Zero
Updated
•
59
•
1

RLHFlow/Decision-Tree-Reward-Gemma-2-27B
Text Classification
•
Updated
•
70
•
4

RLHFlow/Decision-Tree-Reward-Llama-3.1-8B
Text Classification
•
Updated
•
477
•
4

RLHFlow/Llama3.1-8B-PRM-Mistral-Data
Text Generation
•
Updated
•
697
•
8

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
Updated
•
11.7k
•
32
datasets
79
RLHFlow/self_rewarding_ift_example_raw_data1
Viewer
•
Updated
•
16.3k
RLHFlow/self_rewarding_ift_example
Viewer
•
Updated
•
32k
RLHFlow/qwq_gen_sft_15k
Viewer
•
Updated
•
15k
•
40
RLHFlow/numia_prompt_ppo
Viewer
•
Updated
•
404k
•
104
•
1
RLHFlow/numia_prompt_dpo_test
Viewer
•
Updated
•
1.02k
•
35
RLHFlow/numia_prompt_dpo9
Viewer
•
Updated
•
20k
•
37
RLHFlow/numia_prompt_dpo8
Viewer
•
Updated
•
20k
•
35
RLHFlow/numia_prompt_dpo7
Viewer
•
Updated
•
20k
•
32
RLHFlow/numia_prompt_dpo6
Viewer
•
Updated
•
20k
•
32
RLHFlow/numia_prompt_dpo5
Viewer
•
Updated
•
20k
•
33