Update README.md
Browse files
README.md
CHANGED
@@ -125,7 +125,43 @@ tokenizer.push_to_hub(save_to)
|
|
125 |
```
|
126 |
|
127 |
# Model Quality
|
128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
129 |
|
130 |
# Memory Usage
|
131 |
|
@@ -135,7 +171,7 @@ TODO
|
|
135 |
| Peak Memory | 65.72 GB | 34.54 GB (-47.44%) |
|
136 |
|
137 |
<details>
|
138 |
-
<summary> Reproduce
|
139 |
|
140 |
Code
|
141 |
```Py
|
|
|
125 |
```
|
126 |
|
127 |
# Model Quality
|
128 |
+
We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
|
129 |
+
|
130 |
+
| Benchmark | | |
|
131 |
+
|----------------------------------|----------------|---------------------------|
|
132 |
+
| | Qwen3-8B | Qwen3-8B-int4wo |
|
133 |
+
| **General** | | |
|
134 |
+
| mmlu | 73.04 | 70.4 |
|
135 |
+
| mmlu_pro | 53.81 | 52.79 |
|
136 |
+
| bbh | 79.33 | 74.92 |
|
137 |
+
| **Multilingual** | | |
|
138 |
+
| mgsm_en_cot_en | 39.6 | 33.2 |
|
139 |
+
| m_mmlu (avg) | 57.17 | 54.06 |
|
140 |
+
| **Math** | | |
|
141 |
+
| gpqa_main_zeroshot | 35.71 | 32.14 |
|
142 |
+
| gsm8k | 87.79 | 86.28 |
|
143 |
+
| leaderboard_math_hard (v3) | 53.7 | 46.83 |
|
144 |
+
| **Overall** | 60.02 | 56.33 |
|
145 |
+
|
146 |
+
<details>
|
147 |
+
<summary> Reproduce Model Quality Results </summary>
|
148 |
+
|
149 |
+
Need to install lm-eval from source:
|
150 |
+
https://github.com/EleutherAI/lm-evaluation-harness#install
|
151 |
+
|
152 |
+
## baseline
|
153 |
+
```Shell
|
154 |
+
lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks mmlu --device cuda:0 --batch_size 8
|
155 |
+
```
|
156 |
+
|
157 |
+
## float8 dynamic quantization (float8dq)
|
158 |
+
```Shell
|
159 |
+
export MODEL=pytorch/Qwen3-32B-float8dq
|
160 |
+
# or
|
161 |
+
# export MODEL=Qwen/Qwen3-32B
|
162 |
+
lm_eval --model hf --model_args pretrained=$MODEL --tasks mmlu --device cuda:0 --batch_size 8
|
163 |
+
```
|
164 |
+
</details>
|
165 |
|
166 |
# Memory Usage
|
167 |
|
|
|
171 |
| Peak Memory | 65.72 GB | 34.54 GB (-47.44%) |
|
172 |
|
173 |
<details>
|
174 |
+
<summary> Reproduce Peak Memory Usage Results </summary>
|
175 |
|
176 |
Code
|
177 |
```Py
|