RuiyangSun
commited on
Commit
•
5ea6c15
1
Parent(s):
8def050
docs: update readme
Browse files
README.md
CHANGED
@@ -34,6 +34,7 @@ It can play a role in the safe RLHF algorithm, helping the Beaver model become m
|
|
34 |
- **Beaver:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0>
|
35 |
- **Dataset:** <https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF>
|
36 |
- **Reward Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-reward>
|
|
|
37 |
- **Paper:** *Coming soon...*
|
38 |
|
39 |
## How to Use the Reward Model
|
|
|
34 |
- **Beaver:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0>
|
35 |
- **Dataset:** <https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF>
|
36 |
- **Reward Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-reward>
|
37 |
+
- **Cost Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-v1.0-cost>
|
38 |
- **Paper:** *Coming soon...*
|
39 |
|
40 |
## How to Use the Reward Model
|