RTO-RL
/

Llama3-8B-RTO

Model card Files Files and versions Community

zkshan2002 commited on Feb 11

Commit

7d45fa0

·

verified ·

1 Parent(s): 76e1665

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -4,6 +4,10 @@ datasets:
 base_model:
 - OpenRLHF/Llama-3-8b-sft-mixture
 ---
 DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO)
-Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel)

 base_model:
 - OpenRLHF/Llama-3-8b-sft-mixture
 ---
+Base Model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture)
 DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO)
+Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel)
+Prompt dataset: [weqweasdas/ultra_train](https://huggingface.co/datasets/weqweasdas/ultra_train)