RTO-RL
/

Llama3-8B-PPO

Model card Files Files and versions Community

zkshan2002 commited on Feb 11

Commit

f7897f4

·

verified ·

1 Parent(s): bc49bd8

Create README.md

Files changed (1) hide show

README.md +11 -0

README.md ADDED Viewed

	@@ -0,0 +1,11 @@

+---
+datasets:
+- weqweasdas/ultra_train
+base_model:
+- OpenRLHF/Llama-3-8b-sft-mixture
+---
+Base Model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture)
+Reward model: [RTO-RL/Llama3-8B-RewardModel](https://huggingface.co/RTO-RL/Llama3-8B-RewardModel)
+Prompt dataset: [weqweasdas/ultra_train](https://huggingface.co/datasets/weqweasdas/ultra_train)