Update README.md
Browse files
README.md
CHANGED
@@ -3,8 +3,7 @@ datasets:
|
|
3 |
- weqweasdas/ultra_train
|
4 |
base_model:
|
5 |
- OpenRLHF/Llama-3-8b-sft-mixture
|
6 |
-
|
7 |
-
-
|
8 |
-
|
9 |
-
-
|
10 |
-
---
|
|
|
3 |
- weqweasdas/ultra_train
|
4 |
base_model:
|
5 |
- OpenRLHF/Llama-3-8b-sft-mixture
|
6 |
+
---
|
7 |
+
DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO)
|
8 |
+
|
9 |
+
Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel)
|
|