guangy10 commited on
Commit
adde4e0
·
verified ·
1 Parent(s): 125d167

Updated with model quality (partial)

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -175,6 +175,42 @@ The response from the manual testing is:
175
  ```
176
  Okay, the user is asking if I can talk to them. First, I need to clarify that I can't communicate like a human because I don't have consciousness or emotions. I'm an AI model created by Hugging Face.
177
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
  # Disclaimer
180
  PyTorch has not performed safety evaluations or red teamed the quantized models. Performance characteristics, outputs, and behaviors may differ from the original models. Users are solely responsible for selecting appropriate use cases, evaluating and mitigating for accuracy, safety, and fairness, ensuring security, and complying with all applicable laws and regulations.
 
175
  ```
176
  Okay, the user is asking if I can talk to them. First, I need to clarify that I can't communicate like a human because I don't have consciousness or emotions. I'm an AI model created by Hugging Face.
177
  ```
178
+ # Model Quality
179
+
180
+ | Benchmark | | |
181
+ |----------------------------------|----------------|---------------------------|
182
+ | | SmolLM3-3B | SmolLM3-3B-8da4w |
183
+ | **Popular aggregated benchmark** | | |
184
+ | mmlu | - | - |
185
+ | mmlu_pro | - | - |
186
+ | bbh | - | - |
187
+ | **Reasoning** | | |
188
+ | gpqa_main_zeroshot | - | 27.46 |
189
+ | **Multilingual** | | |
190
+ | m_mmlu | - | - |
191
+ | mgsm_en_cot_en | - | 40.40 |
192
+ | **Math** | | |
193
+ | gsm8k | - | 58.08 |
194
+ | leaderboard_math_hard (v3) | - | 19.94 |
195
+ | **Overall** | - | - |
196
+
197
+ <details>
198
+ <summary> Reproduce Model Quality Results </summary>
199
+
200
+ We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
201
+
202
+ Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation-harness#install
203
+
204
+ ## baseline
205
+ ```Shell
206
+ lm_eval --model hf --model_args pretrained=pytorch/SmolLM3-3B-8da4w --tasks mmlu --device cuda:0 --batch_size auto
207
+ ```
208
+
209
+ ## int8 dynamic activation and int4 weight quantization (8da4w)
210
+ ```Shell
211
+ lm_eval --model hf --model_args pretrained=pytorch/SmolLM3-3B-8da4w --tasks mmlu --device cuda:0 --batch_size auto
212
+ ```
213
+ </details>
214
 
215
  # Disclaimer
216
  PyTorch has not performed safety evaluations or red teamed the quantized models. Performance characteristics, outputs, and behaviors may differ from the original models. Users are solely responsible for selecting appropriate use cases, evaluating and mitigating for accuracy, safety, and fairness, ensuring security, and complying with all applicable laws and regulations.