Update README.md
Browse files
README.md
CHANGED
@@ -68,7 +68,7 @@ This model was created with [llm-compressor](https://github.com/vllm-project/llm
|
|
68 |
|
69 |
|
70 |
```bash
|
71 |
-
python quantize.py --model_path ibm-granite/granite-3.1-2b-instruct --quant_path "output_dir/granite-3.1-2b-instruct-quantized.w4a16" --calib_size
|
72 |
```
|
73 |
|
74 |
|
@@ -191,26 +191,26 @@ evalplus.evaluate \
|
|
191 |
|
192 |
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
|
193 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
194 |
-
| ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 54.18
|
195 |
-
| GSM8K (Strict-Match, 5-shot) | 60.96 |
|
196 |
-
| HellaSwag (Acc-Norm, 10-shot) | 75.21 | 73.
|
197 |
-
| MMLU (Acc, 5-shot) | 54.38 | 52.
|
198 |
-
| TruthfulQA (MC2, 0-shot) | 55.93 |
|
199 |
-
| Winogrande (Acc, 5-shot) | 69.67 | 69.
|
200 |
-
| **Average Score** | **61.98** | **61.
|
201 |
-
| **Recovery** | **100.00** | **99.
|
202 |
|
203 |
#### OpenLLM Leaderboard V2 evaluation scores
|
204 |
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
|
205 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
206 |
-
| IFEval (Inst Level Strict Acc, 0-shot)| 67.99 |
|
207 |
-
| BBH (Acc-Norm, 3-shot) | 44.11 |
|
208 |
-
| Math-Hard (Exact-Match, 4-shot) | 8.66 |
|
209 |
-
| GPQA (Acc-Norm, 0-shot) | 28.30 |
|
210 |
-
| MUSR (Acc-Norm, 0-shot) | 35.12 |
|
211 |
-
| MMLU-Pro (Acc, 5-shot) | 26.87 |
|
212 |
-
| **Average Score** | **35.17** | **
|
213 |
-
| **Recovery** | **100.00** | **
|
214 |
|
215 |
|
216 |
#### HumanEval pass@1 scores
|
|
|
68 |
|
69 |
|
70 |
```bash
|
71 |
+
python quantize.py --model_path ibm-granite/granite-3.1-2b-instruct --quant_path "output_dir/granite-3.1-2b-instruct-quantized.w4a16" --calib_size 1024 --dampening_frac 0.01 --observer mse --group_size 64
|
72 |
```
|
73 |
|
74 |
|
|
|
191 |
|
192 |
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
|
193 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
194 |
+
| ARC-Challenge (Acc-Norm, 25-shot) | 55.63 | 54.18 |
|
195 |
+
| GSM8K (Strict-Match, 5-shot) | 60.96 | 62.85 |
|
196 |
+
| HellaSwag (Acc-Norm, 10-shot) | 75.21 | 73.36 |
|
197 |
+
| MMLU (Acc, 5-shot) | 54.38 | 52.17 |
|
198 |
+
| TruthfulQA (MC2, 0-shot) | 55.93 | 56.83 |
|
199 |
+
| Winogrande (Acc, 5-shot) | 69.67 | 69.85 |
|
200 |
+
| **Average Score** | **61.98** | **61.54** |
|
201 |
+
| **Recovery** | **100.00** | **99.29** |
|
202 |
|
203 |
#### OpenLLM Leaderboard V2 evaluation scores
|
204 |
| Metric | ibm-granite/granite-3.1-2b-instruct | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
|
205 |
|-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
|
206 |
+
| IFEval (Inst Level Strict Acc, 0-shot)| 67.99 | 67.63 |
|
207 |
+
| BBH (Acc-Norm, 3-shot) | 44.11 | 43.22 |
|
208 |
+
| Math-Hard (Exact-Match, 4-shot) | 8.66 | 8.77 |
|
209 |
+
| GPQA (Acc-Norm, 0-shot) | 28.30 | 28.56 |
|
210 |
+
| MUSR (Acc-Norm, 0-shot) | 35.12 | 35.26 |
|
211 |
+
| MMLU-Pro (Acc, 5-shot) | 26.87 | 27.27 |
|
212 |
+
| **Average Score** | **35.17** | **35.12** |
|
213 |
+
| **Recovery** | **100.00** | **99.84** |
|
214 |
|
215 |
|
216 |
#### HumanEval pass@1 scores
|