AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
Collections
11
models
29

RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
Text Generation
•
Updated
•
24
•
1

RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej
Text Generation
•
Updated
•
21
•
1

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
Updated
•
11.8k
•
•
35

RLHFlow/Qwen2.5-7B-SFT
Updated
•
39

RLHFlow/Qwen2.5-7B-RAFT-Zero
Updated
•
12

RLHFlow/Qwen2.5-7B-DPO-NLL-Zero
Updated
•
17

RLHFlow/Qwen2.5-7B-DPO-Zero
Updated
•
12

RLHFlow/Qwen2.5-7B-DPO
Updated
•
9

RLHFlow/Qwen2.5-7B-PPO-Zero
Updated
•
9
•
2

RLHFlow/Decision-Tree-Reward-Gemma-2-27B
Text Classification
•
Updated
•
55
•
7
datasets
83
RLHFlow/self_rewarding_turn2_example
Updated
•
31
RLHFlow/self_rewarding_turn1_with_rewards_example
Updated
•
29
RLHFlow/self_rewarding_rl_prompt
Updated
•
45
RLHFlow/self_rewarding_sft_prompt
Viewer
•
Updated
•
40k
•
33
RLHFlow/self_rewarding_ift_example_raw_data1
Viewer
•
Updated
•
16.3k
•
26
RLHFlow/self_rewarding_ift_example
Viewer
•
Updated
•
32k
•
44
RLHFlow/qwq_gen_sft_15k
Viewer
•
Updated
•
15k
•
35
RLHFlow/numia_prompt_ppo
Viewer
•
Updated
•
404k
•
40
•
1
RLHFlow/numia_prompt_dpo_test
Viewer
•
Updated
•
1.02k
•
33
RLHFlow/numia_prompt_dpo9
Viewer
•
Updated
•
20k
•
29