Safetensors
llama
zkshan2002 commited on
Commit
76e1665
·
verified ·
1 Parent(s): 71c49be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -3,8 +3,7 @@ datasets:
3
  - weqweasdas/ultra_train
4
  base_model:
5
  - OpenRLHF/Llama-3-8b-sft-mixture
6
- reward_model:
7
- - zkshan2002/r1B-sft_tokenizer
8
- dpo_model:
9
- - zkshan2002/DPO-uf-llama3-8B-OpenRLHF
10
- ---
 
3
  - weqweasdas/ultra_train
4
  base_model:
5
  - OpenRLHF/Llama-3-8b-sft-mixture
6
+ ---
7
+ DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO)
8
+
9
+ Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel)