Lin-K76 commited on
Commit
72e9c41
1 Parent(s): 1512d23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -25,7 +25,7 @@ language:
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
28
- It achieves an average score of 72.33 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 73.11.
29
 
30
  ### Model Optimizations
31
 
@@ -162,7 +162,7 @@ oneshot(
162
  ## Evaluation
163
 
164
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
165
- A modified version of ARC-C was used for evaluations, in line with Llama 3.1's prompting.
166
  ```
167
  lm_eval \
168
  --model vllm \
@@ -206,13 +206,13 @@ lm_eval \
206
  </td>
207
  </tr>
208
  <tr>
209
- <td>GSM-8K (5-shot, strict-match)
210
  </td>
211
- <td>75.66
212
  </td>
213
- <td>73.77
214
  </td>
215
- <td>97.50%
216
  </td>
217
  </tr>
218
  <tr>
@@ -248,11 +248,11 @@ lm_eval \
248
  <tr>
249
  <td><strong>Average</strong>
250
  </td>
251
- <td><strong>73.11</strong>
252
  </td>
253
- <td><strong>72.33</strong>
254
  </td>
255
- <td><strong>98.94%</strong>
256
  </td>
257
  </tr>
258
  </table>
 
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
28
+ It achieves an average score of 73.67 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 74.17.
29
 
30
  ### Model Optimizations
31
 
 
162
  ## Evaluation
163
 
164
  The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
165
+ A modified version of ARC-C and GSM8k-cot was used for evaluations, in line with Llama 3.1's prompting. It can be accessed on the [Neural Magic fork of the lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct).
166
  ```
167
  lm_eval \
168
  --model vllm \
 
206
  </td>
207
  </tr>
208
  <tr>
209
+ <td>GSM-8K-cot (8-shot, strict-match)
210
  </td>
211
+ <td>82.03
212
  </td>
213
+ <td>81.80
214
  </td>
215
+ <td>99.72%
216
  </td>
217
  </tr>
218
  <tr>
 
248
  <tr>
249
  <td><strong>Average</strong>
250
  </td>
251
+ <td><strong>74.17</strong>
252
  </td>
253
+ <td><strong>73.67</strong>
254
  </td>
255
+ <td><strong>99.33%</strong>
256
  </td>
257
  </tr>
258
  </table>