More metrics update
Browse files
README.md
CHANGED
@@ -186,13 +186,13 @@ Okay, the user is asking if I can talk to them. First, I need to clarify that I
|
|
186 |
| bbh | - | - |
|
187 |
| **Reasoning** | | |
|
188 |
| hellaswag | - | 54.39 |
|
189 |
-
| gpqa_main_zeroshot |
|
190 |
| **Multilingual** | | |
|
191 |
| m_mmlu | - | - |
|
192 |
-
| mgsm_en_cot_en |
|
193 |
| **Math** | | |
|
194 |
-
| gsm8k |
|
195 |
-
| leaderboard_math_hard (v3) |
|
196 |
| **Overall** | - | - |
|
197 |
|
198 |
<details>
|
|
|
186 |
| bbh | - | - |
|
187 |
| **Reasoning** | | |
|
188 |
| hellaswag | - | 54.39 |
|
189 |
+
| gpqa_main_zeroshot | 32.37 | 27.46 |
|
190 |
| **Multilingual** | | |
|
191 |
| m_mmlu | - | - |
|
192 |
+
| mgsm_en_cot_en | 66.80 | 40.40 |
|
193 |
| **Math** | | |
|
194 |
+
| gsm8k | 72.71 | 58.08 |
|
195 |
+
| leaderboard_math_hard (v3) | 27.87 | 19.94 |
|
196 |
| **Overall** | - | - |
|
197 |
|
198 |
<details>
|