jerryzh168 commited on
Commit
ccf920f
·
verified ·
1 Parent(s): 2c27924

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -2
README.md CHANGED
@@ -125,7 +125,43 @@ tokenizer.push_to_hub(save_to)
125
  ```
126
 
127
  # Model Quality
128
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
 
130
  # Memory Usage
131
 
@@ -135,7 +171,7 @@ TODO
135
  | Peak Memory | 65.72 GB | 34.54 GB (-47.44%) |
136
 
137
  <details>
138
- <summary> Reproduce peak memory usage </summary>
139
 
140
  Code
141
  ```Py
 
125
  ```
126
 
127
  # Model Quality
128
+ We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
129
+
130
+ | Benchmark | | |
131
+ |----------------------------------|----------------|---------------------------|
132
+ | | Qwen3-8B | Qwen3-8B-int4wo |
133
+ | **General** | | |
134
+ | mmlu | 73.04 | 70.4 |
135
+ | mmlu_pro | 53.81 | 52.79 |
136
+ | bbh | 79.33 | 74.92 |
137
+ | **Multilingual** | | |
138
+ | mgsm_en_cot_en | 39.6 | 33.2 |
139
+ | m_mmlu (avg) | 57.17 | 54.06 |
140
+ | **Math** | | |
141
+ | gpqa_main_zeroshot | 35.71 | 32.14 |
142
+ | gsm8k | 87.79 | 86.28 |
143
+ | leaderboard_math_hard (v3) | 53.7 | 46.83 |
144
+ | **Overall** | 60.02 | 56.33 |
145
+
146
+ <details>
147
+ <summary> Reproduce Model Quality Results </summary>
148
+
149
+ Need to install lm-eval from source:
150
+ https://github.com/EleutherAI/lm-evaluation-harness#install
151
+
152
+ ## baseline
153
+ ```Shell
154
+ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks mmlu --device cuda:0 --batch_size 8
155
+ ```
156
+
157
+ ## float8 dynamic quantization (float8dq)
158
+ ```Shell
159
+ export MODEL=pytorch/Qwen3-32B-float8dq
160
+ # or
161
+ # export MODEL=Qwen/Qwen3-32B
162
+ lm_eval --model hf --model_args pretrained=$MODEL --tasks mmlu --device cuda:0 --batch_size 8
163
+ ```
164
+ </details>
165
 
166
  # Memory Usage
167
 
 
171
  | Peak Memory | 65.72 GB | 34.54 GB (-47.44%) |
172
 
173
  <details>
174
+ <summary> Reproduce Peak Memory Usage Results </summary>
175
 
176
  Code
177
  ```Py