Update pipeline tag to text-ranking and add descriptive tags (#3)
Browse files- Update pipeline tag to text-ranking and add descriptive tags (b38f9139057875a729ef5f4606a83fa74fce66ad)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,8 +1,14 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
-
pipeline_tag: text-generation
|
4 |
library_name: transformers
|
|
|
|
|
5 |
paper: 2507.09104
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
---
|
7 |
|
8 |
# CompassJudger-2
|
@@ -113,7 +119,7 @@ CompassJudger-2 sets a new state-of-the-art for judge models, outperforming gene
|
|
113 |
|
114 |
| Model | JudgerBench V2 | JudgeBench | RMB | RewardBench | Average |
|
115 |
| :--------------------------------- | :------------: | :--------: | :-------: | :---------: | :-------: |
|
116 |
-
| **7B Judge Models** | | | | |
|
117 |
| CompassJudger-1-7B-Instruct | 57.96 | 46.00 | 38.18 | 80.74 | 55.72 |
|
118 |
| Con-J-7B-Instruct | 52.35 | 38.06 | 71.50 | 87.10 | 62.25 |
|
119 |
| RISE-Judge-Qwen2.5-7B | 46.12 | 40.48 | 72.64 | 88.20 | 61.61 |
|
@@ -129,7 +135,7 @@ CompassJudger-2 sets a new state-of-the-art for judge models, outperforming gene
|
|
129 |
| Qwen3-235B-A22B | 61.40 | 65.97 | 75.59 | 84.68 | 71.91 |
|
130 |
|
131 |
|
132 |
-
For detailed benchmark performance and methodology, please refer to our
|
133 |
|
134 |
## License
|
135 |
|
|
|
1 |
---
|
|
|
|
|
2 |
library_name: transformers
|
3 |
+
license: apache-2.0
|
4 |
+
pipeline_tag: text-ranking
|
5 |
paper: 2507.09104
|
6 |
+
language: en
|
7 |
+
tags:
|
8 |
+
- judge-model
|
9 |
+
- evaluation
|
10 |
+
- reward-modeling
|
11 |
+
- text-ranking
|
12 |
---
|
13 |
|
14 |
# CompassJudger-2
|
|
|
119 |
|
120 |
| Model | JudgerBench V2 | JudgeBench | RMB | RewardBench | Average |
|
121 |
| :--------------------------------- | :------------: | :--------: | :-------: | :---------: | :-------: |
|
122 |
+
| **7B Judge Models** | | | | | |\
|
123 |
| CompassJudger-1-7B-Instruct | 57.96 | 46.00 | 38.18 | 80.74 | 55.72 |
|
124 |
| Con-J-7B-Instruct | 52.35 | 38.06 | 71.50 | 87.10 | 62.25 |
|
125 |
| RISE-Judge-Qwen2.5-7B | 46.12 | 40.48 | 72.64 | 88.20 | 61.61 |
|
|
|
135 |
| Qwen3-235B-A22B | 61.40 | 65.97 | 75.59 | 84.68 | 71.91 |
|
136 |
|
137 |
|
138 |
+
For detailed benchmark performance and methodology, please refer to our 📑 [Paper](https://arxiv.org/abs/2507.09104).
|
139 |
|
140 |
## License
|
141 |
|