Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ pipeline_tag: text-classification
|
|
11 |
This reward model achieves a score of 88.4 on reward-bench, which is finetuned from the [Ray2333/GRM-Gemma2-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma2-2B-sftreg) using the decontaminated [Skywork preference dataset v0.2](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2).
|
12 |
We obtain a **SOTA 2B reward model** that can outperform a series of 8B reward models and even surpass gpt4/gemini as a judge.
|
13 |
|
14 |
-
Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b)
|
15 |
|
16 |
|
17 |
|
|
|
11 |
This reward model achieves a score of 88.4 on reward-bench, which is finetuned from the [Ray2333/GRM-Gemma2-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma2-2B-sftreg) using the decontaminated [Skywork preference dataset v0.2](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2).
|
12 |
We obtain a **SOTA 2B reward model** that can outperform a series of 8B reward models and even surpass gpt4/gemini as a judge.
|
13 |
|
14 |
+
Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b), our paper at [Arxiv](https://arxiv.org/abs/2406.10216), and github repo at [Github](https://github.com/YangRui2015/Generalizable-Reward-Model).
|
15 |
|
16 |
|
17 |
|