nm-research commited on
Commit
c70235c
·
verified ·
1 Parent(s): 7fd5827

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -68,7 +68,7 @@ This model was created with [llm-compressor](https://github.com/vllm-project/llm
68
 
69
 
70
  ```bash
71
- python quantize.py --model_path ibm-granite/granite-3.1-2b-instruct --quant_path "output_dir/granite-3.1-2b-instruct-quantized.w4a16" --calib_size 2048 --dampening_frac 0.1 --observer mse
72
  ```
73
 
74
 
@@ -191,26 +191,26 @@ evalplus.evaluate \
191
 
192
  | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
193
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
194
- | ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 54.18 |
195
- | GSM8K (Strict-Match, 5-shot) | 60.96 | 63.61 |
196
- | HellaSwag (Acc-Norm, 10-shot) | 75.21 | 73.07 |
197
- | MMLU (Acc, 5-shot) | 54.38 | 52.88 |
198
- | TruthfulQA (MC2, 0-shot) | 55.93 | 58.30 |
199
- | Winogrande (Acc, 5-shot) | 69.67 | 69.77 |
200
- | **Average Score** | **61.98** | **61.97** |
201
- | **Recovery** | **100.00** | **99.98** |
202
 
203
  #### OpenLLM Leaderboard V2 evaluation scores
204
  | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
205
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
206
- | IFEval (Inst Level Strict Acc, 0-shot)| 67.99 | 56.12 |
207
- | BBH (Acc-Norm, 3-shot) | 44.11 | 41.32 |
208
- | Math-Hard (Exact-Match, 4-shot) | 8.66 | 7.38 |
209
- | GPQA (Acc-Norm, 0-shot) | 28.30 | 27.64 |
210
- | MUSR (Acc-Norm, 0-shot) | 35.12 | 33.95 |
211
- | MMLU-Pro (Acc, 5-shot) | 26.87 | 25.70 |
212
- | **Average Score** | **35.17** | **32.02** |
213
- | **Recovery** | **100.00** | **91.03** |
214
 
215
 
216
  #### HumanEval pass@1 scores
 
68
 
69
 
70
  ```bash
71
+ python quantize.py --model_path ibm-granite/granite-3.1-2b-instruct --quant_path "output_dir/granite-3.1-2b-instruct-quantized.w4a16" --calib_size 1024 --dampening_frac 0.01 --observer mse --group_size 64
72
  ```
73
 
74
 
 
191
 
192
  | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
193
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
194
+ | ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 54.18 |
195
+ | GSM8K (Strict-Match, 5-shot) | 60.96 | 62.85 |
196
+ | HellaSwag (Acc-Norm, 10-shot) | 75.21 | 73.36 |
197
+ | MMLU (Acc, 5-shot) | 54.38 | 52.17 |
198
+ | TruthfulQA (MC2, 0-shot) | 55.93 | 56.83 |
199
+ | Winogrande (Acc, 5-shot) | 69.67 | 69.85 |
200
+ | **Average Score** | **61.98** | **61.54** |
201
+ | **Recovery** | **100.00** | **99.29** |
202
 
203
  #### OpenLLM Leaderboard V2 evaluation scores
204
  | Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
205
  |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
206
+ | IFEval (Inst Level Strict Acc, 0-shot)| 67.99 | 67.63 |
207
+ | BBH (Acc-Norm, 3-shot) | 44.11 | 43.22 |
208
+ | Math-Hard (Exact-Match, 4-shot) | 8.66 | 8.77 |
209
+ | GPQA (Acc-Norm, 0-shot) | 28.30 | 28.56 |
210
+ | MUSR (Acc-Norm, 0-shot) | 35.12 | 35.26 |
211
+ | MMLU-Pro (Acc, 5-shot) | 26.87 | 27.27 |
212
+ | **Average Score** | **35.17** | **35.12** |
213
+ | **Recovery** | **100.00** | **99.84** |
214
 
215
 
216
  #### HumanEval pass@1 scores