Preference Data Dahoas/full-hh-rlhf Viewer • Updated Feb 23, 2023 • 125k • 1.08k • 83 HuggingFaceH4/ultrafeedback_binarized Viewer • Updated Oct 16, 2024 • 187k • 8.85k • 298 PKU-Alignment/PKU-SafeRLHF Viewer • Updated Oct 18, 2024 • 164k • 3.67k • 145 Skywork/Skywork-Reward-Preference-80K-v0.2 Viewer • Updated Oct 25, 2024 • 77k • 563 • 55
Yifan's PPO Models lblaoke/llama2-7b-ppo-human 7B • Updated Feb 3 • 2 lblaoke/llama2-7b-ppo-self 7B • Updated Feb 3 • 2 lblaoke/llama2-7b-ppo-self-human 7B • Updated Feb 3 • 2 lblaoke/mistral-v0.1-7b-ppo-human 7B • Updated Feb 4 • 1
Draft Models lblaoke/qwama-0.5b-skywork-pref-dpo-llama-factory-v1 0.5B • Updated Mar 19 • 3 lblaoke/qwama-0.5b-skywork-pref-dpo-trl-v1 0.5B • Updated Mar 19 • 3 lblaoke/qwama-0.5b-skywork-pref-dpo-trl-v2 0.5B • Updated Mar 21 • 4 lblaoke/qwama-0.5b-skywork-pref-sft-rejected-trl-v3 0.5B • Updated Mar 28 • 3
Yifan's RMs lblaoke/mistral-v0.3-7b-rm-self-human Text Classification • 7B • Updated Jan 14 • 2 lblaoke/mistral-v0.3-7b-rm-self Text Classification • 7B • Updated Jan 14 • 3 lblaoke/mistral-v0.3-7b-rm-human Text Classification • 7B • Updated Jan 14 • 2 lblaoke/mistral-v0.1-7b-rm-self-human Text Classification • 7B • Updated Jan 14 • 2
Preference Data Dahoas/full-hh-rlhf Viewer • Updated Feb 23, 2023 • 125k • 1.08k • 83 HuggingFaceH4/ultrafeedback_binarized Viewer • Updated Oct 16, 2024 • 187k • 8.85k • 298 PKU-Alignment/PKU-SafeRLHF Viewer • Updated Oct 18, 2024 • 164k • 3.67k • 145 Skywork/Skywork-Reward-Preference-80K-v0.2 Viewer • Updated Oct 25, 2024 • 77k • 563 • 55
Draft Models lblaoke/qwama-0.5b-skywork-pref-dpo-llama-factory-v1 0.5B • Updated Mar 19 • 3 lblaoke/qwama-0.5b-skywork-pref-dpo-trl-v1 0.5B • Updated Mar 19 • 3 lblaoke/qwama-0.5b-skywork-pref-dpo-trl-v2 0.5B • Updated Mar 21 • 4 lblaoke/qwama-0.5b-skywork-pref-sft-rejected-trl-v3 0.5B • Updated Mar 28 • 3
Yifan's PPO Models lblaoke/llama2-7b-ppo-human 7B • Updated Feb 3 • 2 lblaoke/llama2-7b-ppo-self 7B • Updated Feb 3 • 2 lblaoke/llama2-7b-ppo-self-human 7B • Updated Feb 3 • 2 lblaoke/mistral-v0.1-7b-ppo-human 7B • Updated Feb 4 • 1
Yifan's RMs lblaoke/mistral-v0.3-7b-rm-self-human Text Classification • 7B • Updated Jan 14 • 2 lblaoke/mistral-v0.3-7b-rm-self Text Classification • 7B • Updated Jan 14 • 3 lblaoke/mistral-v0.3-7b-rm-human Text Classification • 7B • Updated Jan 14 • 2 lblaoke/mistral-v0.1-7b-rm-self-human Text Classification • 7B • Updated Jan 14 • 2