Safetensors
llama
zkshan2002 commited on
Commit
7d45fa0
·
verified ·
1 Parent(s): 76e1665

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -4,6 +4,10 @@ datasets:
4
  base_model:
5
  - OpenRLHF/Llama-3-8b-sft-mixture
6
  ---
 
 
7
  DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO)
8
 
9
- Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel)
 
 
 
4
  base_model:
5
  - OpenRLHF/Llama-3-8b-sft-mixture
6
  ---
7
+ Base Model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture)
8
+
9
  DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO)
10
 
11
+ Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel)
12
+
13
+ Prompt dataset: [weqweasdas/ultra_train](https://huggingface.co/datasets/weqweasdas/ultra_train)