hendrydong
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -56,6 +56,11 @@ This Reward model is the SOTA open-source RM (Apr 20, 2024) on Reward-Bench.
|
|
56 |
|
57 |
You can also refer to our short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
|
58 |
|
|
|
|
|
|
|
|
|
|
|
59 |
## References
|
60 |
The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows:
|
61 |
|
|
|
56 |
|
57 |
You can also refer to our short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
|
58 |
|
59 |
+
|
60 |
+
## Contact
|
61 |
+
|
62 |
+
Please contact hanze.dong AT salesforce.com if you have any questions.
|
63 |
+
|
64 |
## References
|
65 |
The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows:
|
66 |
|