datasets: | |
- weqweasdas/ultra_train | |
base_model: | |
- OpenRLHF/Llama-3-8b-sft-mixture | |
Base Model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture) | |
Reward model: [RTO-RL/Llama3-8B-RewardModel](https://huggingface.co/RTO-RL/Llama3-8B-RewardModel) | |
Prompt dataset: [weqweasdas/ultra_train](https://huggingface.co/datasets/weqweasdas/ultra_train) |