sfairXC
/

FsfairX-LLaMA3-RM-v0.1

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hendrydong commited on Apr 20, 2024

Commit

a82a31c

·

verified ·

1 Parent(s): f3760a7

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -56,6 +56,11 @@ This Reward model is the SOTA open-source RM (Apr 20, 2024) on Reward-Bench.
 You can also refer to our short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
 ## References
 The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows:

 You can also refer to our short blog for RM training details: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0.
+## Contact
+Please contact hanze.dong AT salesforce.com if you have any questions.
 ## References
 The repo was part of the iterative rejection sampling fine-tuning and iterative DPO. If you find the content of this repo useful in your work, please consider cite it as follows: