jaxchang commited on
Commit
54b75d5
·
verified ·
1 Parent(s): fee83ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -76,16 +76,16 @@ Note: The following benchmarks are evaluated by TRT-LLM-backend
76
 
77
  Hunyuan-A13B-Instruct has achieved highly competitive performance across multiple benchmarks, particularly in mathematics, science, agent domains, and more. We compared it with several powerful models, and the results are shown below.
78
 
79
- | Topic | Bench | OpenAI-o1-1217 | DeepSeek R1 | Qwen3-A22B | Hunyuan-A13B-Instruct |
80
- |:-------------------:|:-----------------------------:|:-------------:|:------------:|:-----------:|:---------------------:|
81
- | **Mathematics** | AIME 2024<br>AIME 2025<br>MATH | 74.3<br>79.2<br>96.4 | 79.8<br>70<br>94.9 | 85.7<br>81.5<br>94.0 | 87.3<br>76.8<br>94.3 |
82
- | **Science** | GPQA-Diamond<br>OlympiadBench | 78<br>83.1 | 71.5<br>82.4 | 71.1<br>85.7 | 71.2<br>82.7 |
83
- | **Coding** | Livecodebench<br>Fullstackbench<br>ArtifactsBench | 63.9<br>64.6<br>38.6 | 65.9<br>71.6<br>44.6 | 70.7<br>65.6<br>44.6 | 63.9<br>67.8<br>43 |
84
- | **Reasoning** | BBH<br>DROP<br>ZebraLogic | 80.4<br>90.2<br>81 | 83.7<br>92.2<br>78.7 | 88.9<br>90.3<br>80.3 | 89.1<br>91.1<br>84.7 |
85
- | **Instruction<br>Following** | IF-Eval<br>SysBench | 91.8<br>82.5 | 88.3<br>77.7 | 83.4<br>74.2 | 84.7<br>76.1 |
86
- | **Text<br>Creation**| LengthCtrl<br>InsCtrl | 60.1<br>74.8 | 55.9<br>69 | 53.3<br>73.7 | 55.4<br>71.9 |
87
- | **NLU** | ComplexNLU<br>Word-Task | 64.7<br>67.1 | 64.5<br>76.3 | 59.8<br>56.4 | 61.2<br>62.9 |
88
- | **Agent** | BDCL v3<br> τ-Bench<br>ComplexFuncBench<br> C3-Bench | 67.8<br>60.4<br>47.6<br>58.8 | 56.9<br>43.8<br>41.1<br>55.3 | 70.8<br>44.6<br>40.6<br>51.7 | 78.3<br>54.7<br>61.2<br>63.5 |
89
 
90
 
91
  &nbsp;
 
76
 
77
  Hunyuan-A13B-Instruct has achieved highly competitive performance across multiple benchmarks, particularly in mathematics, science, agent domains, and more. We compared it with several powerful models, and the results are shown below.
78
 
79
+ | **Topic** | **Bench** | **OpenAI-o1-1217** | **DeepSeek R1** | **Qwen3-A22B** | **Hunyuan-A13B-Instruct** |
80
+ | :--------------------------: | :------------------------------------------------: | :------------------------------: | :--------------------------: | :--------------------------: | :--------------------------------------: |
81
+ | **Mathematics** | AIME 2024<br>AIME 2025<br>MATH | 74.3<br>79.2<br>**96.4** | 79.8<br>70<br>94.9 | 85.7<br>**81.5**<br>94.0 | **87.3**<br>76.8<br>94.3 |
82
+ | **Science** | GPQA-Diamond<br>OlympiadBench | **78**<br>83.1 | 71.5<br>82.4 | 71.1<br>**85.7** | 71.2<br>82.7 |
83
+ | **Coding** | Livecodebench<br>Fullstackbench<br>ArtifactsBench | 63.9<br>64.6<br>38.6 | 65.9<br>**71.6**<br>**44.6** | **70.7**<br>65.6<br>**44.6** | 63.9<br>67.8<br>43 |
84
+ | **Reasoning** | BBH<br>DROP<br>ZebraLogic | 80.4<br>90.2<br>81 | 83.7<br>**92.2**<br>78.7 | 88.9<br>90.3<br>80.3 | **89.1**<br>91.1<br>**84.7** |
85
+ | **Instruction<br>Following** | IF-Eval<br>SysBench | **91.8**<br>**82.5** | 88.3<br>77.7 | 83.4<br>74.2 | 84.7<br>76.1 |
86
+ | **Text<br>Creation** | LengthCtrl<br>InsCtrl | **60.1**<br>**74.8** | 55.9<br>69 | 53.3<br>73.7 | 55.4<br>71.9 |
87
+ | **NLU** | ComplexNLU<br>Word-Task | **64.7**<br>67.1 | 64.5<br>**76.3** | 59.8<br>56.4 | 61.2<br>62.9 |
88
+ | **Agent** | BDCL v3<br>τ-Bench<br>ComplexFuncBench<br>C3-Bench | 67.8<br>**60.4**<br>47.6<br>58.8 | 56.9<br>43.8<br>41.1<br>55.3 | 70.8<br>44.6<br>40.6<br>51.7 | **78.3**<br>54.7<br>**61.2**<br>**63.5** |
89
 
90
 
91
  &nbsp;