PKU-Alignment
/

beaver-7b-v1.0-reward

Reinforcement Learning

reinforcement-learning-from-human-feedback

Model card Files Files and versions Community

RuiyangSun commited on Jul 10, 2023

Commit

5ea6c15

·

1 Parent(s): 8def050

docs: update readme

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -34,6 +34,7 @@ It can play a role in the safe RLHF algorithm, helping the Beaver model become m
 - **Beaver:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0>
 - **Dataset:** <https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF>
 - **Reward Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-reward>
 - **Paper:** *Coming soon...*
 ## How to Use the Reward Model

 - **Beaver:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0>
 - **Dataset:** <https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF>
 - **Reward Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-reward>
+- **Cost Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-cost>
 - **Paper:** *Coming soon...*
 ## How to Use the Reward Model