nm-research commited on
Commit
2379f83
·
verified ·
1 Parent(s): 2cb656c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -2
README.md CHANGED
@@ -28,7 +28,7 @@ tags:
28
  - **Model Developers:** Neural Magic
29
 
30
  Quantized version of [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct).
31
- It achieves an average score of 43.38 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 43.64.
32
 
33
  ### Model Optimizations
34
 
@@ -95,6 +95,8 @@ lm_eval \
95
  </td>
96
  </tr>
97
  <tr>
 
 
98
  <td>MMLU (5-shot)
99
  </td>
100
  <td>46.83
@@ -164,5 +166,77 @@ lm_eval \
164
  <td><strong>99.4%</strong>
165
  </td>
166
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
  </table>
168
-
 
28
  - **Model Developers:** Neural Magic
29
 
30
  Quantized version of [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct).
31
+ It achieves an average score of 43.38 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark version 1 and 23.42 on version 2, whereas the unquantized model achieves 43.64 on version 1 and 23.39 on version 2.
32
 
33
  ### Model Optimizations
34
 
 
95
  </td>
96
  </tr>
97
  <tr>
98
+ <td rowspan="7" ><strong>OpenLLM v1</strong>
99
+ </td>
100
  <td>MMLU (5-shot)
101
  </td>
102
  <td>46.83
 
166
  <td><strong>99.4%</strong>
167
  </td>
168
  </tr>
169
+ <tr>
170
+ <td rowspan="7" ><strong>OpenLLM v2</strong>
171
+ </td>
172
+ <td>MMLU-Pro (5-shot)
173
+ </td>
174
+ <td>17.49
175
+ </td>
176
+ <td>16.95
177
+ </td>
178
+ <td>96.9%
179
+ </td>
180
+ </tr>
181
+ <tr>
182
+ <td>IFEval (0-shot)
183
+ </td>
184
+ <td>31.17
185
+ </td>
186
+ <td>32.04
187
+ </td>
188
+ <td>102.8%
189
+ </td>
190
+ </tr>
191
+ <tr>
192
+ <td>BBH (3-shot)
193
+ </td>
194
+ <td>32.79
195
+ </td>
196
+ <td>32.51
197
+ </td>
198
+ <td>99.2%
199
+ </td>
200
+ </tr>
201
+ <tr>
202
+ <td>Math-lvl-5 (4-shot)
203
+ </td>
204
+ <td>0.21
205
+ </td>
206
+ <td>0.17
207
+ </td>
208
+ <td>***
209
+ </td>
210
+ </tr>
211
+ <tr>
212
+ <td>GPQA (0-shot)
213
+ </td>
214
+ <td>25.67
215
+ </td>
216
+ <td>26.12
217
+ </td>
218
+ <td>101.8%
219
+ </td>
220
+ </tr>
221
+ <tr>
222
+ <td>MuSR (0-shot)
223
+ </td>
224
+ <td>33.02
225
+ </td>
226
+ <td>32.75
227
+ </td>
228
+ <td>99.2%
229
+ </td>
230
+ </tr>
231
+ <tr>
232
+ <td><strong>Average</strong>
233
+ </td>
234
+ <td><strong>23.39</strong>
235
+ </td>
236
+ <td><strong>23.42</strong>
237
+ </td>
238
+ <td><strong>100.1%</strong>
239
+ </td>
240
+ </tr>
241
  </table>
242
+ *** Reference value too low to report meaningful recovery.