RedHatAI
/

granite-3.1-2b-instruct-quantized.w4a16

@@ -68,7 +68,7 @@ This model was created with [llm-compressor](https://github.com/vllm-project/llm
 ```bash
-python quantize.py --model_path ibm-granite/granite-3.1-2b-instruct --quant_path "output_dir/granite-3.1-2b-instruct-quantized.w4a16" --calib_size 2048 --dampening_frac 0.1 --observer mse
 ```
@@ -191,26 +191,26 @@ evalplus.evaluate \
 | Metric                                  | ibm-granite/granite-3.1-2b-instruct             | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
 |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
-| ARC-Challenge (Acc-Norm, 25-shot)       | 55.63                             | 54.18                                       |
-| GSM8K (Strict-Match, 5-shot)            | 60.96                             | 63.61                                       |
-| HellaSwag (Acc-Norm, 10-shot)           | 75.21                             | 73.07                                       |
-| MMLU (Acc, 5-shot)                      | 54.38                             | 52.88                                         |
-| TruthfulQA (MC2, 0-shot)                | 55.93                             | 58.30                                        |
-| Winogrande (Acc, 5-shot)                | 69.67                             | 69.77                                        |
-| **Average Score**                       | **61.98**                         | **61.97**                                   |
-| **Recovery**                            | **100.00**                        | **99.98**                                   |
 #### OpenLLM Leaderboard V2 evaluation scores
 | Metric                                  | ibm-granite/granite-3.1-2b-instruct             | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
 |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
-| IFEval (Inst Level Strict Acc, 0-shot)| 67.99                           | 56.12                                          |
-| BBH (Acc-Norm, 3-shot)            | 44.11                             | 41.32                                         |
-| Math-Hard (Exact-Match, 4-shot)   | 8.66                            | 7.38                                       |
-| GPQA (Acc-Norm, 0-shot)           | 28.30                             | 27.64                                        |
-| MUSR (Acc-Norm, 0-shot)           | 35.12                             | 33.95                                         |
-| MMLU-Pro (Acc, 5-shot)            | 26.87                             | 25.70                                        |
-| **Average Score**                 | **35.17**                         | **32.02**                                    |
-| **Recovery**                      | **100.00**                         | **91.03**                                    |
 #### HumanEval pass@1 scores

 ```bash
+python quantize.py --model_path ibm-granite/granite-3.1-2b-instruct --quant_path "output_dir/granite-3.1-2b-instruct-quantized.w4a16" --calib_size 1024 --dampening_frac 0.01 --observer mse --group_size 64
 ```
 | Metric                                  | ibm-granite/granite-3.1-2b-instruct             | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
 |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
+| ARC-Challenge (Acc-Norm, 25-shot)       | 55.63                             | 54.18                                      |
+| GSM8K (Strict-Match, 5-shot)            | 60.96                             | 62.85                                       |
+| HellaSwag (Acc-Norm, 10-shot)           | 75.21                             | 73.36                                       |
+| MMLU (Acc, 5-shot)                      | 54.38                             | 52.17                                        |
+| TruthfulQA (MC2, 0-shot)                | 55.93                             | 56.83                                        |
+| Winogrande (Acc, 5-shot)                | 69.67                             | 69.85                                        |
+| **Average Score**                       | **61.98**                         | **61.54**                                   |
+| **Recovery**                            | **100.00**                        | **99.29**                                   |
 #### OpenLLM Leaderboard V2 evaluation scores
 | Metric                                  | ibm-granite/granite-3.1-2b-instruct             | neuralmagic-ent/granite-3.1-2b-instruct-quantized.w4a16 |
 |-----------------------------------------|:---------------------------------:|:-------------------------------------------:|
+| IFEval (Inst Level Strict Acc, 0-shot)| 67.99                           | 67.63                                          |
+| BBH (Acc-Norm, 3-shot)            | 44.11                             | 43.22                                        |
+| Math-Hard (Exact-Match, 4-shot)   | 8.66                            | 8.77                                       |
+| GPQA (Acc-Norm, 0-shot)           | 28.30                             | 28.56                                        |
+| MUSR (Acc-Norm, 0-shot)           | 35.12                             | 35.26                                         |
+| MMLU-Pro (Acc, 5-shot)            | 26.87                             | 27.27                                        |
+| **Average Score**                 | **35.17**                         | **35.12**                                    |
+| **Recovery**                      | **100.00**                         | **99.84**                                    |
 #### HumanEval pass@1 scores