jerryzh168 commited on
Commit
23ccbd2
·
verified ·
1 Parent(s): ccf920f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -129,19 +129,19 @@ We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-h
129
 
130
  | Benchmark | | |
131
  |----------------------------------|----------------|---------------------------|
132
- | | Qwen3-8B | Qwen3-8B-int4wo |
133
  | **General** | | |
134
- | mmlu | 73.04 | 70.4 |
135
- | mmlu_pro | 53.81 | 52.79 |
136
- | bbh | 79.33 | 74.92 |
137
  | **Multilingual** | | |
138
- | mgsm_en_cot_en | 39.6 | 33.2 |
139
- | m_mmlu (avg) | 57.17 | 54.06 |
140
  | **Math** | | |
141
- | gpqa_main_zeroshot | 35.71 | 32.14 |
142
- | gsm8k | 87.79 | 86.28 |
143
- | leaderboard_math_hard (v3) | 53.7 | 46.83 |
144
- | **Overall** | 60.02 | 56.33 |
145
 
146
  <details>
147
  <summary> Reproduce Model Quality Results </summary>
 
129
 
130
  | Benchmark | | |
131
  |----------------------------------|----------------|---------------------------|
132
+ | | Qwen3-32B | Qwen3-32B-float8dq |
133
  | **General** | | |
134
+ | mmlu | WIP | WIP |
135
+ | mmlu_pro | WIP | WIP |
136
+ | bbh | WIP | WIP |
137
  | **Multilingual** | | |
138
+ | mgsm_en_cot_en | WIP | WIP |
139
+ | m_mmlu (avg) | WIP | WIP |
140
  | **Math** | | |
141
+ | gpqa_main_zeroshot | WIP | WIP |
142
+ | gsm8k | WIP | WIP |
143
+ | leaderboard_math_hard (v3) | WIP | WIP |
144
+ | **Overall** | WIP | WIP |
145
 
146
  <details>
147
  <summary> Reproduce Model Quality Results </summary>