Update README.md
Browse files
README.md
CHANGED
@@ -53,23 +53,23 @@ As a powerful yet computationally efficient large model, Hunyuan-A13B is an idea
|
|
53 |
|
54 |
Note: The following benchmarks are evaluated by TRT-LLM-backend
|
55 |
|
56 |
-
| Model | Hunyuan-Large | Qwen2.5-72B
|
57 |
-
|
58 |
-
| MMLU | 88.
|
59 |
-
| MMLU-Pro | 60.20 | 58.10 |
|
60 |
-
| MMLU-Redux | 87.47 | 83.90 |
|
61 |
-
| BBH | 86.30 | 85.
|
62 |
-
| SuperGPQA | 38.90 |
|
63 |
-
| EvalPlus | 75.69 |
|
64 |
-
| MultiPL-E | 59.13 |
|
65 |
-
| MBPP | 72.60 |
|
66 |
-
| CRUX-
|
67 |
-
|
|
68 |
-
|
|
69 |
-
|
|
70 |
-
|
|
71 |
-
|
|
72 |
-
|
73 |
|
74 |
|
75 |
|
|
|
53 |
|
54 |
Note: The following benchmarks are evaluated by TRT-LLM-backend
|
55 |
|
56 |
+
| Model | Hunyuan-Large | Qwen2.5-72B | Qwen3-A22B | Hunyuan-A13B |
|
57 |
+
|------------------|---------------|--------------|-------------|---------------|
|
58 |
+
| MMLU | 88.40 | 86.10 | 87.81 | 88.17 |
|
59 |
+
| MMLU-Pro | 60.20 | 58.10 | 68.18 | 67.23 |
|
60 |
+
| MMLU-Redux | 87.47 | 83.90 | 87.40 | 87.67 |
|
61 |
+
| BBH | 86.30 | 85.80 | 88.87 | 87.56 |
|
62 |
+
| SuperGPQA | 38.90 | 36.20 | 44.06 | 41.32 |
|
63 |
+
| EvalPlus | 75.69 | 65.93 | 77.60 | 78.64 |
|
64 |
+
| MultiPL-E | 59.13 | 60.50 | 65.94 | 69.33 |
|
65 |
+
| MBPP | 72.60 | 76.00 | 81.40 | 83.86 |
|
66 |
+
| CRUX-I | 57.00 | 57.63 | - | 70.13 |
|
67 |
+
| CRUX-O | 60.63 | 66.20 | 79.00 | 77.00 |
|
68 |
+
| MATH | 69.80 | 62.12 | 71.84 | 72.35 |
|
69 |
+
| CMATH | 91.30 | 84.80 | - | 91.17 |
|
70 |
+
| GSM8k | 92.80 | 91.50 | 94.39 | 91.83 |
|
71 |
+
| GPQA | 25.18 | 45.90 | 47.47 | 49.12 |
|
72 |
+
|
73 |
|
74 |
|
75 |
|