Ray2333
/

GRM-gemma2-2B-rewardmodel-ft

Text Classification

Model card Files Files and versions Community

Ray2333 commited on 24 days ago

Commit

6589ad1

•

1 Parent(s): 2f0b839

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ pipeline_tag: text-classification
 This reward model achieves a score of 88.4 on reward-bench, which is finetuned from the [Ray2333/GRM-Gemma2-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma2-2B-sftreg) using the decontaminated [Skywork preference dataset v0.2](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2).
 We obtain a **SOTA 2B reward model** that can outperform a series of 8B reward models and even surpass gpt4/gemini as a judge.
-Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b) and our paper at [Arxiv](https://arxiv.org/abs/2406.10216).

 This reward model achieves a score of 88.4 on reward-bench, which is finetuned from the [Ray2333/GRM-Gemma2-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma2-2B-sftreg) using the decontaminated [Skywork preference dataset v0.2](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2).
 We obtain a **SOTA 2B reward model** that can outperform a series of 8B reward models and even surpass gpt4/gemini as a judge.
+Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b), our paper at [Arxiv](https://arxiv.org/abs/2406.10216), and github repo at [Github](https://github.com/YangRui2015/Generalizable-Reward-Model).