binwang commited on
Commit
4c16719
·
verified ·
1 Parent(s): 86ecf17

Upload organize_model_results.json with huggingface_hub

Browse files
Files changed (1) hide show
  1. organize_model_results.json +2046 -0
organize_model_results.json ADDED
@@ -0,0 +1,2046 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "ukusnews_short_test": {
3
+ "wer": {
4
+ "whisper_large_v3": 0.06168908700151238,
5
+ "Qwen-Audio-Chat": 0.10399586086125925,
6
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.06877338215394412,
7
+ "WavLLM_fairseq": 0.2066783411605508,
8
+ "Qwen2-Audio-7B-Instruct": 0.1194380323171217,
9
+ "SALMONN_7B": 0.09042426172092653,
10
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.10144869855926132,
11
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.0700867627159118
12
+ }
13
+ },
14
+ "imda_part6_30s_asr_test": {
15
+ "wer": {
16
+ "whisper_large_v3": 0.1698509342851144,
17
+ "Qwen-Audio-Chat": 0.31394240863063033,
18
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.1789273082575623,
19
+ "WavLLM_fairseq": 0.42541061709652933,
20
+ "Qwen2-Audio-7B-Instruct": 0.2245352799625317,
21
+ "SALMONN_7B": 0.24872817713464365,
22
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.11292172031202054,
23
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.17467982364056267
24
+ }
25
+ },
26
+ "covost2_en_id_test": {
27
+ "bleu": {
28
+ "whisper_large_v3": 1.600581653970121,
29
+ "Qwen-Audio-Chat": 4.102230932924371,
30
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 27.620150160643625,
31
+ "WavLLM_fairseq": 13.841886973016162,
32
+ "Qwen2-Audio-7B-Instruct": 16.325186897428104,
33
+ "SALMONN_7B": 14.102682915273142,
34
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 37.60224687716629,
35
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 10.930203684508578
36
+ }
37
+ },
38
+ "imda_part3_30s_asr_test": {
39
+ "wer": {
40
+ "whisper_large_v3": 0.27026366524560785,
41
+ "Qwen-Audio-Chat": 0.6412550574306894,
42
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.3035544573275043,
43
+ "WavLLM_fairseq": 0.7540934640345399,
44
+ "Qwen2-Audio-7B-Instruct": 0.35076166942732234,
45
+ "SALMONN_7B": 0.6569229098215983,
46
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.2919053954978684,
47
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.29992939962527493
48
+ }
49
+ },
50
+ "gigaspeech_test": {
51
+ "wer": {
52
+ "whisper_large_v3": 0.09459022434812692,
53
+ "Qwen-Audio-Chat": 0.13018910022587737,
54
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.09948381629977261,
55
+ "WavLLM_fairseq": 0.15491778414546403,
56
+ "Qwen2-Audio-7B-Instruct": 0.11723812890302816,
57
+ "SALMONN_7B": 0.10765150204693537,
58
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.14457154747310655,
59
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.09515429104337297
60
+ }
61
+ },
62
+ "covost2_ta_en_test": {
63
+ "bleu": {
64
+ "whisper_large_v3": 2.451098639578599,
65
+ "Qwen-Audio-Chat": 0.01699144301093184,
66
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 2.8327095799289337,
67
+ "WavLLM_fairseq": 0.1695522548322915,
68
+ "Qwen2-Audio-7B-Instruct": 0.04425838146050298,
69
+ "SALMONN_7B": 0.3649023706010388,
70
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 5.023057608950299,
71
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 2.4245628096245917
72
+ }
73
+ },
74
+ "librispeech_test_other": {
75
+ "wer": {
76
+ "whisper_large_v3": 0.03660128246354058,
77
+ "Qwen-Audio-Chat": 0.043467569561352074,
78
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.05307658841999735,
79
+ "WavLLM_fairseq": 0.04798834811886432,
80
+ "Qwen2-Audio-7B-Instruct": 0.060415760304159495,
81
+ "SALMONN_7B": 0.09671439650443565,
82
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.041576030415949455,
83
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.03714982881570734
84
+ }
85
+ },
86
+ "parliament_test": {
87
+ "wer": {
88
+ "whisper_large_v3": 0.0753619074652285,
89
+ "Qwen-Audio-Chat": 0.26279685873781816,
90
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.06282524363705176,
91
+ "WavLLM_fairseq": 0.5216434856656259,
92
+ "Qwen2-Audio-7B-Instruct": 0.23270886555019396,
93
+ "SALMONN_7B": 0.3010928186204939,
94
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.058922319992430694,
95
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.07517267480367111
96
+ }
97
+ },
98
+ "earnings22_test": {
99
+ "wer": {
100
+ "whisper_large_v3": 0.15887899737116104,
101
+ "Qwen-Audio-Chat": 0.3664994875132684,
102
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.1448629161356777,
103
+ "WavLLM_fairseq": 0.6671766188447099,
104
+ "Qwen2-Audio-7B-Instruct": 0.23542555661330924,
105
+ "SALMONN_7B": 0.3597423676988383,
106
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.1652245056860175,
107
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.15611126487402763
108
+ }
109
+ },
110
+ "imda_part2_asr_test": {
111
+ "wer": {
112
+ "whisper_large_v3": 0.3171008846684522,
113
+ "Qwen-Audio-Chat": 0.45479263046830615,
114
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.32988393799204613,
115
+ "WavLLM_fairseq": 0.4463923382842302,
116
+ "Qwen2-Audio-7B-Instruct": 0.1905689473257041,
117
+ "SALMONN_7B": 0.42346400454508565,
118
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.048088629169710254,
119
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.31912994075156237
120
+ }
121
+ },
122
+ "ukusnews_test": {
123
+ "wer": {
124
+ "whisper_large_v3": 0.07135564378899603,
125
+ "Qwen-Audio-Chat": 0.3158631121194933,
126
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.07388920400831915,
127
+ "WavLLM_fairseq": 0.5911892607298166,
128
+ "Qwen2-Audio-7B-Instruct": 0.13843826810361126,
129
+ "SALMONN_7B": 0.18918510115333712,
130
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.12554358101720553,
131
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.07642276422764227
132
+ }
133
+ },
134
+ "earnings21_test": {
135
+ "wer": {
136
+ "whisper_large_v3": 0.11863959266711877,
137
+ "Qwen-Audio-Chat": 0.2655529121410546,
138
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.11416493424197618,
139
+ "WavLLM_fairseq": 0.6447482518259942,
140
+ "Qwen2-Audio-7B-Instruct": 0.18872219319407232,
141
+ "SALMONN_7B": 0.2577708974886327,
142
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.13488732754499672,
143
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.11773910240019567
144
+ }
145
+ },
146
+ "covost2_zh_en_test": {
147
+ "bleu": {
148
+ "whisper_large_v3": 14.673689493155793,
149
+ "Qwen-Audio-Chat": 9.898238298955656,
150
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 15.209998552437538,
151
+ "WavLLM_fairseq": 2.368659001743569,
152
+ "Qwen2-Audio-7B-Instruct": 16.466557744958333,
153
+ "SALMONN_7B": 5.296039450108202,
154
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 18.76473995941838,
155
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 14.154700735606419
156
+ }
157
+ },
158
+ "covost2_en_ta_test": {
159
+ "bleu": {
160
+ "whisper_large_v3": 0.02107778621423822,
161
+ "Qwen-Audio-Chat": 0.03451483807236294,
162
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 8.433062902024755,
163
+ "WavLLM_fairseq": 0.0033159224040994286,
164
+ "Qwen2-Audio-7B-Instruct": 0.03245972071872916,
165
+ "SALMONN_7B": 0.00046745670226766583,
166
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 14.407399367512914,
167
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 1.0368044741318085
168
+ }
169
+ },
170
+ "librispeech_test_clean": {
171
+ "wer": {
172
+ "whisper_large_v3": 0.01878749009695552,
173
+ "Qwen-Audio-Chat": 0.020258799562379748,
174
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.032349945297468596,
175
+ "WavLLM_fairseq": 0.02103218017882069,
176
+ "Qwen2-Audio-7B-Instruct": 0.035141660693401744,
177
+ "SALMONN_7B": 0.10270871845172973,
178
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.022918474365262006,
179
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.018334779492209605
180
+ }
181
+ },
182
+ "tedlium3_test": {
183
+ "wer": {
184
+ "whisper_large_v3": 0.037649480146197796,
185
+ "Qwen-Audio-Chat": 0.04052375714133636,
186
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.04900464852205386,
187
+ "WavLLM_fairseq": 0.06621482559171073,
188
+ "Qwen2-Audio-7B-Instruct": 0.06114048472375004,
189
+ "SALMONN_7B": 0.0459884319222171,
190
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.07884745040985061,
191
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.038146268762641496
192
+ }
193
+ },
194
+ "imda_part1_asr_test": {
195
+ "wer": {
196
+ "whisper_large_v3": 0.06844171360300393,
197
+ "Qwen-Audio-Chat": 0.10550313315290274,
198
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.07041669714480775,
199
+ "WavLLM_fairseq": 0.10077292565771828,
200
+ "Qwen2-Audio-7B-Instruct": 0.07197717796796138,
201
+ "SALMONN_7B": 0.0925804013361617,
202
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.042254894789457,
203
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.06922195401458074
204
+ }
205
+ },
206
+ "common_voice_15_en_test": {
207
+ "wer": {
208
+ "whisper_large_v3": 0.10001863741235596,
209
+ "Qwen-Audio-Chat": 0.11272421128398918,
210
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.10600831614192711,
211
+ "WavLLM_fairseq": 0.14533325621300636,
212
+ "Qwen2-Audio-7B-Instruct": 0.11438872500819404,
213
+ "SALMONN_7B": 0.3062255383962828,
214
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.07811646454714301,
215
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.09876543209876543
216
+ }
217
+ },
218
+ "mediacorp_test": {
219
+ "wer": {
220
+ "whisper_large_v3": 0.12054884024828487,
221
+ "Qwen-Audio-Chat": 0.4498529892192094,
222
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.12455080039202875,
223
+ "WavLLM_fairseq": 0.3595230316889905,
224
+ "Qwen2-Audio-7B-Instruct": 0.18694870957203527,
225
+ "SALMONN_7B": 0.32089186540346293,
226
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.170859196341065,
227
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.13598497223129696
228
+ }
229
+ },
230
+ "idpc_short_test": {
231
+ "wer": {
232
+ "whisper_large_v3": 0.1662526275558953,
233
+ "Qwen-Audio-Chat": 0.6008025988916491,
234
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.16931014714313014,
235
+ "WavLLM_fairseq": 0.36728454041658704,
236
+ "Qwen2-Audio-7B-Instruct": 0.21326199120963119,
237
+ "SALMONN_7B": 0.26313777947639977,
238
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.24918784635964075,
239
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.15803554366520162
240
+ }
241
+ },
242
+ "seame_dev_man": {
243
+ "wer": {
244
+ "whisper_large_v3": 0.7225930420711975,
245
+ "Qwen-Audio-Chat": 0.8783373786407767,
246
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.7824973031283711,
247
+ "WavLLM_fairseq": 1.2913969795037756,
248
+ "Qwen2-Audio-7B-Instruct": 0.5522518878101402,
249
+ "SALMONN_7B": 1.2721817691477886,
250
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.388282092772384,
251
+ "gemini-1.5-flash": 0.9690871089536138,
252
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.6848705501618123
253
+ }
254
+ },
255
+ "cna_test": {
256
+ "wer": {
257
+ "whisper_large_v3": 0.13841717398269784,
258
+ "Qwen-Audio-Chat": 0.19753284203780838,
259
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.15171419416853574,
260
+ "WavLLM_fairseq": 0.26946491509131687,
261
+ "Qwen2-Audio-7B-Instruct": 0.2067713339741536,
262
+ "SALMONN_7B": 0.15395706504325538,
263
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.15924383210509452,
264
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.13798996048275125
265
+ }
266
+ },
267
+ "ytb_asr_batch1": {
268
+ "wer": {
269
+ "whisper_large_v3": 0.12226319428439733,
270
+ "Qwen-Audio-Chat": 0.2297764461857571,
271
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.1400092187139894,
272
+ "WavLLM_fairseq": 0.41876008296842593,
273
+ "Qwen2-Audio-7B-Instruct": 0.16843358684796805,
274
+ "SALMONN_7B": 0.21487285856956287,
275
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.11484981178458939,
276
+ "gemini-1.5-flash": 0.1089344703080587,
277
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.12579703464700007
278
+ }
279
+ },
280
+ "mediacorp_short_test": {
281
+ "wer": {
282
+ "whisper_large_v3": 0.11715763436024286,
283
+ "Qwen-Audio-Chat": 0.2548909377108163,
284
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.14571621317742298,
285
+ "WavLLM_fairseq": 0.2621992354396222,
286
+ "Qwen2-Audio-7B-Instruct": 0.17180121430177647,
287
+ "SALMONN_7B": 0.1751742747919946,
288
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.13301101866426804,
289
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.11434675061839443
290
+ }
291
+ },
292
+ "peoples_speech_test": {
293
+ "wer": {
294
+ "whisper_large_v3": 0.14602420615337386,
295
+ "Qwen-Audio-Chat": 0.31419144746723354,
296
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.20140159998943682,
297
+ "WavLLM_fairseq": 0.3792176325635977,
298
+ "Qwen2-Audio-7B-Instruct": 0.2165498391593041,
299
+ "SALMONN_7B": 0.23699946689025367,
300
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.21050407754683692,
301
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.14540692118393275
302
+ }
303
+ },
304
+ "covost2_en_zh_test": {
305
+ "bleu": {
306
+ "whisper_large_v3": 0.16408986541757878,
307
+ "Qwen-Audio-Chat": 15.330641138043728,
308
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 35.274306071307024,
309
+ "WavLLM_fairseq": 31.96381187282953,
310
+ "Qwen2-Audio-7B-Instruct": 25.765420247070075,
311
+ "SALMONN_7B": 33.88941292215531,
312
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 43.941098854450516,
313
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 5.987143868370054
314
+ }
315
+ },
316
+ "tedlium3_long_form_test": {
317
+ "wer": {
318
+ "whisper_large_v3": 0.03208650948413402,
319
+ "Qwen-Audio-Chat": 0.2911540507002305,
320
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.04396383619925545,
321
+ "WavLLM_fairseq": 0.4536784258110264,
322
+ "Qwen2-Audio-7B-Instruct": 0.08739585179932637,
323
+ "SALMONN_7B": 0.14231519234178336,
324
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.10228682857649353,
325
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.04754476156709803
326
+ }
327
+ },
328
+ "seame_dev_sge": {
329
+ "wer": {
330
+ "whisper_large_v3": 0.5377268970583734,
331
+ "Qwen-Audio-Chat": 1.05567969634822,
332
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.5840399155162387,
333
+ "WavLLM_fairseq": 1.2204842511249197,
334
+ "Qwen2-Audio-7B-Instruct": 0.5486546879304539,
335
+ "SALMONN_7B": 1.0189782362484312,
336
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.35550521901496834,
337
+ "gemini-1.5-flash": 1.1100431601824359,
338
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.507882090054792
339
+ }
340
+ },
341
+ "aishell_asr_zh_test": {
342
+ "wer": {
343
+ "whisper_large_v3": 0.12359684029221357,
344
+ "Qwen-Audio-Chat": 0.9469917443725129,
345
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.20886539565639167,
346
+ "WavLLM_fairseq": 0.7054601967888183,
347
+ "Qwen2-Audio-7B-Instruct": 0.09260359129694522,
348
+ "SALMONN_7B": 0.8259290055631446,
349
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.13165449110094832,
350
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.12450753301261111
351
+ }
352
+ },
353
+ "covost2_id_en_test": {
354
+ "bleu": {
355
+ "whisper_large_v3": 46.01512198258627,
356
+ "Qwen-Audio-Chat": 0.45648619714728844,
357
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 46.80524126004861,
358
+ "WavLLM_fairseq": 5.933522277713613,
359
+ "Qwen2-Audio-7B-Instruct": 6.326113431899141,
360
+ "SALMONN_7B": 26.89649039333571,
361
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 44.43289180618449,
362
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 46.79924664837527
363
+ }
364
+ },
365
+ "ytb_asr_batch2": {
366
+ "wer": {
367
+ "whisper_large_v3": 0.17210509244242622,
368
+ "Qwen-Audio-Chat": 0.4315277327278625,
369
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.2192622950819672,
370
+ "WavLLM_fairseq": 0.48091685587631094,
371
+ "Qwen2-Audio-7B-Instruct": 0.2080008649583739,
372
+ "SALMONN_7B": 0.3238620391393664,
373
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.15162720294085846,
374
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.23561466104443723
375
+ }
376
+ },
377
+ "imda_part5_30s_asr_test": {
378
+ "wer": {
379
+ "whisper_large_v3": 0.2143555471246589,
380
+ "Qwen-Audio-Chat": 0.3016882870525747,
381
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.22881615619208825,
382
+ "WavLLM_fairseq": 0.39796588405247263,
383
+ "Qwen2-Audio-7B-Instruct": 0.27856006770658537,
384
+ "SALMONN_7B": 0.34868891450584405,
385
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.17694182194919086,
386
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.22004640235805695
387
+ }
388
+ },
389
+ "parliament_short_test": {
390
+ "wer": {
391
+ "whisper_large_v3": 0.05543951935226013,
392
+ "Qwen-Audio-Chat": 0.09347360821020603,
393
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.07325752301384698,
394
+ "WavLLM_fairseq": 0.09512390087929656,
395
+ "Qwen2-Audio-7B-Instruct": 0.08416492612361723,
396
+ "SALMONN_7B": 0.08676929424202573,
397
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.056935097083623425,
398
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.05742502771975968
399
+ }
400
+ },
401
+ "idpc_test": {
402
+ "wer": {
403
+ "whisper_large_v3": 0.19880239520958085,
404
+ "Qwen-Audio-Chat": 0.7710863986313088,
405
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.16766467065868262,
406
+ "WavLLM_fairseq": 0.7686911890504705,
407
+ "Qwen2-Audio-7B-Instruct": 0.19093242087254064,
408
+ "SALMONN_7B": 0.4550898203592814,
409
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.30008554319931563,
410
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.17741659538066723
411
+ }
412
+ },
413
+ "imda_part3_30s_ds_human_test": {
414
+ "llama3_70b_judge": {
415
+ "Qwen-Audio-Chat": {
416
+ "judge_score": 16.4,
417
+ "success_rate": 1.0
418
+ },
419
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
420
+ "judge_score": 45.4,
421
+ "success_rate": 1.0
422
+ },
423
+ "WavLLM_fairseq": {
424
+ "judge_score": 31.6,
425
+ "success_rate": 1.0
426
+ },
427
+ "Qwen2-Audio-7B-Instruct": {
428
+ "judge_score": 33.8,
429
+ "success_rate": 1.0
430
+ },
431
+ "SALMONN_7B": {
432
+ "judge_score": 9.0,
433
+ "success_rate": 0.99
434
+ },
435
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
436
+ "judge_score": 48.4,
437
+ "success_rate": 0.99
438
+ },
439
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
440
+ "judge_score": 37.400000000000006,
441
+ "success_rate": 1.0
442
+ }
443
+ },
444
+ "gpt4o_judge": {
445
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
446
+ "judge_score": 47.400000000000006,
447
+ "success_rate": 1.0
448
+ }
449
+ }
450
+ },
451
+ "cn_college_listen_mcq_test": {
452
+ "llama3_70b_judge": {
453
+ "Qwen-Audio-Chat": {
454
+ "judge_score": 63.232056362835756,
455
+ "success_rate": 0.9995596653456627
456
+ },
457
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
458
+ "judge_score": 91.85380889476001,
459
+ "success_rate": 1.0
460
+ },
461
+ "WavLLM_fairseq": {
462
+ "judge_score": 66.31439894319684,
463
+ "success_rate": 1.0
464
+ },
465
+ "Qwen2-Audio-7B-Instruct": {
466
+ "judge_score": 74.7247908410392,
467
+ "success_rate": 0.9995596653456627
468
+ },
469
+ "SALMONN_7B": {
470
+ "judge_score": 50.99075297225891,
471
+ "success_rate": 1.0
472
+ },
473
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
474
+ "judge_score": 88.50726552179657,
475
+ "success_rate": 1.0
476
+ },
477
+ "gemini-1.5-flash": {
478
+ "judge_score": 89.25583443416997,
479
+ "success_rate": 0.9991193306913254
480
+ },
481
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
482
+ "judge_score": 85.2928225451343,
483
+ "success_rate": 1.0
484
+ }
485
+ }
486
+ },
487
+ "imda_part3_30s_sqa_test": {
488
+ "llama3_70b_judge": {
489
+ "Qwen-Audio-Chat": {
490
+ "judge_score": 51.08,
491
+ "success_rate": 0.998
492
+ },
493
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
494
+ "judge_score": 70.17999999999999,
495
+ "success_rate": 1.0
496
+ },
497
+ "Qwen2-Audio-7B-Instruct": {
498
+ "judge_score": 60.620000000000005,
499
+ "success_rate": 1.0
500
+ },
501
+ "SALMONN_7B": {
502
+ "judge_score": 50.8,
503
+ "success_rate": 0.999
504
+ },
505
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
506
+ "judge_score": 70.28,
507
+ "success_rate": 1.0
508
+ }
509
+ },
510
+ "gpt4o_judge": {
511
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
512
+ "judge_score": 73.0,
513
+ "success_rate": 0.999
514
+ }
515
+ }
516
+ },
517
+ "openhermes_audio_test": {
518
+ "llama3_70b_judge": {
519
+ "Qwen-Audio-Chat": {
520
+ "judge_score": 10.600000000000001,
521
+ "success_rate": 1.0
522
+ },
523
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
524
+ "judge_score": 72.2,
525
+ "success_rate": 0.96
526
+ },
527
+ "WavLLM_fairseq": {
528
+ "judge_score": 19.2,
529
+ "success_rate": 1.0
530
+ },
531
+ "Qwen2-Audio-7B-Instruct": {
532
+ "judge_score": 44.800000000000004,
533
+ "success_rate": 0.96
534
+ },
535
+ "SALMONN_7B": {
536
+ "judge_score": 15.8,
537
+ "success_rate": 1.0
538
+ },
539
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
540
+ "judge_score": 65.6,
541
+ "success_rate": 1.0
542
+ },
543
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
544
+ "judge_score": 63.0,
545
+ "success_rate": 0.93
546
+ }
547
+ },
548
+ "gpt4o_judge": {
549
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
550
+ "judge_score": 75.0,
551
+ "success_rate": 1.0
552
+ }
553
+ }
554
+ },
555
+ "imda_part5_30s_sqa_human_test": {
556
+ "llama3_70b_judge": {
557
+ "Qwen-Audio-Chat": {
558
+ "judge_score": 47.800000000000004,
559
+ "success_rate": 1.0
560
+ },
561
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
562
+ "judge_score": 74.0,
563
+ "success_rate": 1.0
564
+ },
565
+ "WavLLM_fairseq": {
566
+ "judge_score": 50.8,
567
+ "success_rate": 0.99
568
+ },
569
+ "Qwen2-Audio-7B-Instruct": {
570
+ "judge_score": 51.6,
571
+ "success_rate": 1.0
572
+ },
573
+ "SALMONN_7B": {
574
+ "judge_score": 44.6,
575
+ "success_rate": 1.0
576
+ },
577
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
578
+ "judge_score": 64.80000000000001,
579
+ "success_rate": 1.0
580
+ },
581
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
582
+ "judge_score": 57.800000000000004,
583
+ "success_rate": 1.0
584
+ }
585
+ },
586
+ "gpt4o_judge": {
587
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
588
+ "judge_score": 64.80000000000001,
589
+ "success_rate": 1.0
590
+ }
591
+ }
592
+ },
593
+ "slue_p2_sqa5_test": {
594
+ "llama3_70b_judge": {
595
+ "Qwen-Audio-Chat": {
596
+ "judge_score": 79.36274509803921,
597
+ "success_rate": 0.9975490196078431
598
+ },
599
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
600
+ "judge_score": 88.57843137254902,
601
+ "success_rate": 1.0
602
+ },
603
+ "WavLLM_fairseq": {
604
+ "judge_score": 83.92156862745098,
605
+ "success_rate": 1.0
606
+ },
607
+ "Qwen2-Audio-7B-Instruct": {
608
+ "judge_score": 80.04901960784315,
609
+ "success_rate": 1.0
610
+ },
611
+ "SALMONN_7B": {
612
+ "judge_score": 83.48039215686273,
613
+ "success_rate": 1.0
614
+ },
615
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
616
+ "judge_score": 86.76470588235293,
617
+ "success_rate": 1.0
618
+ },
619
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
620
+ "judge_score": 82.99019607843137,
621
+ "success_rate": 1.0
622
+ }
623
+ },
624
+ "gpt4o_judge": {
625
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
626
+ "judge_score": 87.79411764705883,
627
+ "success_rate": 1.0
628
+ }
629
+ }
630
+ },
631
+ "ytb_sds_batch1": {
632
+ "llama3_70b_judge": {
633
+ "Qwen-Audio-Chat": {
634
+ "judge_score": 43.878954607977995,
635
+ "success_rate": 0.9917469050894085
636
+ },
637
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
638
+ "judge_score": 64.12654745529574,
639
+ "success_rate": 0.9986244841815681
640
+ },
641
+ "WavLLM_fairseq": {
642
+ "judge_score": 55.625859697386524,
643
+ "success_rate": 0.9917469050894085
644
+ },
645
+ "Qwen2-Audio-7B-Instruct": {
646
+ "judge_score": 51.5818431911967,
647
+ "success_rate": 0.9986244841815681
648
+ },
649
+ "SALMONN_7B": {
650
+ "judge_score": 31.279229711141674,
651
+ "success_rate": 0.9972489683631361
652
+ },
653
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
654
+ "judge_score": 53.97524071526823,
655
+ "success_rate": 0.9944979367262724
656
+ },
657
+ "gemini-1.5-flash": {
658
+ "judge_score": 65.9697386519945,
659
+ "success_rate": 0.9931224209078404
660
+ },
661
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
662
+ "judge_score": 59.44979367262724,
663
+ "success_rate": 0.9972489683631361
664
+ }
665
+ }
666
+ },
667
+ "voxceleb_gender_test": {
668
+ "llama3_70b_judge": {
669
+ "Qwen-Audio-Chat": {
670
+ "judge_score": 70.5990972507181,
671
+ "success_rate": 0.9997948297086582
672
+ },
673
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
674
+ "judge_score": 34.94050061551087,
675
+ "success_rate": 1.0
676
+ },
677
+ "WavLLM_fairseq": {
678
+ "judge_score": 69.61427985227739,
679
+ "success_rate": 1.0
680
+ },
681
+ "Qwen2-Audio-7B-Instruct": {
682
+ "judge_score": 99.1177677472302,
683
+ "success_rate": 1.0
684
+ },
685
+ "SALMONN_7B": {
686
+ "judge_score": 88.79770209273697,
687
+ "success_rate": 1.0
688
+ },
689
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
690
+ "judge_score": 99.75379565038982,
691
+ "success_rate": 1.0
692
+ },
693
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
694
+ "judge_score": 42.921624948707425,
695
+ "success_rate": 1.0
696
+ }
697
+ }
698
+ },
699
+ "dream_tts_mcq_test": {
700
+ "llama3_70b_judge": {
701
+ "Qwen-Audio-Chat": {
702
+ "judge_score": 59.749085206481965,
703
+ "success_rate": 1.0
704
+ },
705
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
706
+ "judge_score": 89.33612127548353,
707
+ "success_rate": 1.0
708
+ },
709
+ "WavLLM_fairseq": {
710
+ "judge_score": 66.5446941975954,
711
+ "success_rate": 0.9984317825405122
712
+ },
713
+ "Qwen2-Audio-7B-Instruct": {
714
+ "judge_score": 66.49242028227914,
715
+ "success_rate": 0.9994772608468374
716
+ },
717
+ "SALMONN_7B": {
718
+ "judge_score": 56.455828541557764,
719
+ "success_rate": 1.0
720
+ },
721
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
722
+ "judge_score": 84.31782540512285,
723
+ "success_rate": 1.0
724
+ },
725
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
726
+ "judge_score": 86.4610559330894,
727
+ "success_rate": 1.0
728
+ }
729
+ }
730
+ },
731
+ "ytb_sqa_batch1": {
732
+ "llama3_70b_judge": {
733
+ "Qwen-Audio-Chat": {
734
+ "judge_score": 60.827586206896555,
735
+ "success_rate": 0.9980295566502463
736
+ },
737
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
738
+ "judge_score": 70.18719211822659,
739
+ "success_rate": 1.0
740
+ },
741
+ "WavLLM_fairseq": {
742
+ "judge_score": 60.70935960591133,
743
+ "success_rate": 1.0
744
+ },
745
+ "Qwen2-Audio-7B-Instruct": {
746
+ "judge_score": 60.453201970443345,
747
+ "success_rate": 0.9980295566502463
748
+ },
749
+ "SALMONN_7B": {
750
+ "judge_score": 55.665024630541865,
751
+ "success_rate": 0.9990147783251232
752
+ },
753
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
754
+ "judge_score": 64.51231527093596,
755
+ "success_rate": 0.9980295566502463
756
+ },
757
+ "gemini-1.5-flash": {
758
+ "judge_score": 78.06896551724138,
759
+ "success_rate": 0.9980295566502463
760
+ },
761
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
762
+ "judge_score": 67.3103448275862,
763
+ "success_rate": 1.0
764
+ }
765
+ }
766
+ },
767
+ "spoken_squad_test": {
768
+ "llama3_70b_judge": {
769
+ "Qwen-Audio-Chat": {
770
+ "judge_score": 64.8327415436367,
771
+ "success_rate": 0.9990655952158475
772
+ },
773
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
774
+ "judge_score": 88.61894972902262,
775
+ "success_rate": 0.9998131190431695
776
+ },
777
+ "WavLLM_fairseq": {
778
+ "judge_score": 77.64903756307233,
779
+ "success_rate": 0.997383666604373
780
+ },
781
+ "Qwen2-Audio-7B-Instruct": {
782
+ "judge_score": 64.86264249672958,
783
+ "success_rate": 0.9971967856475425
784
+ },
785
+ "SALMONN_7B": {
786
+ "judge_score": 66.39506634273968,
787
+ "success_rate": 0.9994393571295085
788
+ },
789
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
790
+ "judge_score": 73.66473556344609,
791
+ "success_rate": 0.999252476172678
792
+ },
793
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
794
+ "judge_score": 83.81984675761541,
795
+ "success_rate": 0.998131190431695
796
+ }
797
+ },
798
+ "gpt4o_judge": {
799
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
800
+ "judge_score": 90.12521024107643,
801
+ "success_rate": 1.0
802
+ }
803
+ }
804
+ },
805
+ "imda_part4_30s_sqa_test": {
806
+ "llama3_70b_judge": {
807
+ "Qwen-Audio-Chat": {
808
+ "judge_score": 41.92,
809
+ "success_rate": 0.999
810
+ },
811
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
812
+ "judge_score": 66.34,
813
+ "success_rate": 1.0
814
+ },
815
+ "Qwen2-Audio-7B-Instruct": {
816
+ "judge_score": 50.279999999999994,
817
+ "success_rate": 0.999
818
+ },
819
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
820
+ "judge_score": 61.980000000000004,
821
+ "success_rate": 1.0
822
+ }
823
+ },
824
+ "gpt4o_judge": {
825
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
826
+ "judge_score": 64.9,
827
+ "success_rate": 1.0
828
+ }
829
+ }
830
+ },
831
+ "imda_gr_dialogue": {
832
+ "llama3_70b_judge": {
833
+ "Qwen-Audio-Chat": {
834
+ "judge_score": 37.2,
835
+ "success_rate": 0.9996666666666667
836
+ },
837
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
838
+ "judge_score": 19.6,
839
+ "success_rate": 1.0
840
+ },
841
+ "WavLLM_fairseq": {
842
+ "judge_score": 46.766666666666666,
843
+ "success_rate": 1.0
844
+ },
845
+ "Qwen2-Audio-7B-Instruct": {
846
+ "judge_score": 61.56666666666667,
847
+ "success_rate": 0.9996666666666667
848
+ },
849
+ "SALMONN_7B": {
850
+ "judge_score": 42.733333333333334,
851
+ "success_rate": 0.9993333333333333
852
+ },
853
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
854
+ "judge_score": 93.76666666666667,
855
+ "success_rate": 1.0
856
+ },
857
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
858
+ "judge_score": 25.433333333333337,
859
+ "success_rate": 0.9996666666666667
860
+ }
861
+ }
862
+ },
863
+ "imda_ar_dialogue": {
864
+ "llama3_70b_judge": {
865
+ "Qwen-Audio-Chat": {
866
+ "judge_score": 0.6666666666666667,
867
+ "success_rate": 0.9996666666666667
868
+ },
869
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
870
+ "judge_score": 7.633333333333334,
871
+ "success_rate": 1.0
872
+ },
873
+ "WavLLM_fairseq": {
874
+ "judge_score": 0.23333333333333336,
875
+ "success_rate": 0.9996666666666667
876
+ },
877
+ "Qwen2-Audio-7B-Instruct": {
878
+ "judge_score": 0.9666666666666667,
879
+ "success_rate": 1.0
880
+ },
881
+ "SALMONN_7B": {
882
+ "judge_score": 0.06666666666666667,
883
+ "success_rate": 1.0
884
+ },
885
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
886
+ "judge_score": 77.83333333333333,
887
+ "success_rate": 1.0
888
+ },
889
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
890
+ "judge_score": 9.666666666666666,
891
+ "success_rate": 0.9986666666666667
892
+ }
893
+ }
894
+ },
895
+ "audiocaps_test": {
896
+ "meteor": {
897
+ "Qwen-Audio-Chat": 0.27553015076950976,
898
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.05796819723943051,
899
+ "WavLLM_fairseq": 0.041732965094428545,
900
+ "Qwen2-Audio-7B-Instruct": 0.19891712076314283,
901
+ "SALMONN_7B": 0.20994052484339956,
902
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.24920047034353812,
903
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.07953048457785493
904
+ },
905
+ "llama3_70b_judge": {
906
+ "Qwen-Audio-Chat": {
907
+ "judge_score": 47.04090909090909,
908
+ "success_rate": 0.9990909090909091
909
+ },
910
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
911
+ "judge_score": 3.0954545454545457,
912
+ "success_rate": 0.9995454545454545
913
+ },
914
+ "WavLLM_fairseq": {
915
+ "judge_score": 5.5,
916
+ "success_rate": 0.9977272727272727
917
+ },
918
+ "Qwen2-Audio-7B-Instruct": {
919
+ "judge_score": 40.77727272727273,
920
+ "success_rate": 0.9977272727272727
921
+ },
922
+ "SALMONN_7B": {
923
+ "judge_score": 37.445454545454545,
924
+ "success_rate": 0.9988636363636364
925
+ },
926
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
927
+ "judge_score": 38.00454545454545,
928
+ "success_rate": 0.9997727272727273
929
+ },
930
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
931
+ "judge_score": 2.4727272727272727,
932
+ "success_rate": 0.9997727272727273
933
+ }
934
+ },
935
+ "gpt4o_judge": {
936
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
937
+ "judge_score": 4.868181818181818,
938
+ "success_rate": 0.9981818181818182
939
+ }
940
+ }
941
+ },
942
+ "imda_part5_30s_ds_test": {
943
+ "llama3_70b_judge": {
944
+ "Qwen-Audio-Chat": {
945
+ "judge_score": 39.14,
946
+ "success_rate": 0.996
947
+ },
948
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
949
+ "judge_score": 61.48,
950
+ "success_rate": 0.996
951
+ },
952
+ "Qwen2-Audio-7B-Instruct": {
953
+ "judge_score": 45.38,
954
+ "success_rate": 0.997
955
+ },
956
+ "SALMONN_7B": {
957
+ "judge_score": 24.340000000000003,
958
+ "success_rate": 0.998
959
+ },
960
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
961
+ "judge_score": 54.379999999999995,
962
+ "success_rate": 0.998
963
+ }
964
+ },
965
+ "gpt4o_judge": {
966
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
967
+ "judge_score": 63.68000000000001,
968
+ "success_rate": 1.0
969
+ }
970
+ }
971
+ },
972
+ "ytb_pqa_batch1": {
973
+ "llama3_70b_judge": {
974
+ "Qwen-Audio-Chat": {
975
+ "judge_score": 37.16117216117216,
976
+ "success_rate": 0.9990842490842491
977
+ },
978
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
979
+ "judge_score": 55.01831501831502,
980
+ "success_rate": 0.9990842490842491
981
+ },
982
+ "WavLLM_fairseq": {
983
+ "judge_score": 40.95238095238095,
984
+ "success_rate": 1.0
985
+ },
986
+ "Qwen2-Audio-7B-Instruct": {
987
+ "judge_score": 36.97802197802198,
988
+ "success_rate": 0.9981684981684982
989
+ },
990
+ "SALMONN_7B": {
991
+ "judge_score": 32.124542124542124,
992
+ "success_rate": 1.0
993
+ },
994
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
995
+ "judge_score": 40.97069597069597,
996
+ "success_rate": 0.9990842490842491
997
+ },
998
+ "gemini-1.5-flash": {
999
+ "judge_score": 49.908424908424905,
1000
+ "success_rate": 0.9972527472527473
1001
+ },
1002
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1003
+ "judge_score": 52.252747252747255,
1004
+ "success_rate": 0.9990842490842491
1005
+ }
1006
+ }
1007
+ },
1008
+ "imda_ar_sentence": {
1009
+ "llama3_70b_judge": {
1010
+ "Qwen-Audio-Chat": {
1011
+ "judge_score": 3.933333333333333,
1012
+ "success_rate": 0.9996666666666667
1013
+ },
1014
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1015
+ "judge_score": 26.016666666666666,
1016
+ "success_rate": 0.9998333333333334
1017
+ },
1018
+ "WavLLM_fairseq": {
1019
+ "judge_score": 2.6833333333333336,
1020
+ "success_rate": 0.999
1021
+ },
1022
+ "Qwen2-Audio-7B-Instruct": {
1023
+ "judge_score": 2.55,
1024
+ "success_rate": 0.9998333333333334
1025
+ },
1026
+ "SALMONN_7B": {
1027
+ "judge_score": 2.5166666666666666,
1028
+ "success_rate": 0.999
1029
+ },
1030
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1031
+ "judge_score": 7.816666666666666,
1032
+ "success_rate": 0.9995
1033
+ },
1034
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1035
+ "judge_score": 12.416666666666666,
1036
+ "success_rate": 0.9995
1037
+ }
1038
+ }
1039
+ },
1040
+ "imda_part6_30s_sqa_human_test": {
1041
+ "llama3_70b_judge": {
1042
+ "Qwen-Audio-Chat": {
1043
+ "judge_score": 51.4,
1044
+ "success_rate": 1.0
1045
+ },
1046
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1047
+ "judge_score": 71.6,
1048
+ "success_rate": 1.0
1049
+ },
1050
+ "WavLLM_fairseq": {
1051
+ "judge_score": 62.199999999999996,
1052
+ "success_rate": 1.0
1053
+ },
1054
+ "Qwen2-Audio-7B-Instruct": {
1055
+ "judge_score": 53.6,
1056
+ "success_rate": 1.0
1057
+ },
1058
+ "SALMONN_7B": {
1059
+ "judge_score": 46.8,
1060
+ "success_rate": 1.0
1061
+ },
1062
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1063
+ "judge_score": 67.2,
1064
+ "success_rate": 1.0
1065
+ },
1066
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1067
+ "judge_score": 64.0,
1068
+ "success_rate": 1.0
1069
+ }
1070
+ },
1071
+ "gpt4o_judge": {
1072
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1073
+ "judge_score": 67.0,
1074
+ "success_rate": 1.0
1075
+ }
1076
+ }
1077
+ },
1078
+ "imda_gr_sentence": {
1079
+ "llama3_70b_judge": {
1080
+ "Qwen-Audio-Chat": {
1081
+ "judge_score": 57.550000000000004,
1082
+ "success_rate": 1.0
1083
+ },
1084
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1085
+ "judge_score": 26.35,
1086
+ "success_rate": 1.0
1087
+ },
1088
+ "WavLLM_fairseq": {
1089
+ "judge_score": 49.06666666666666,
1090
+ "success_rate": 0.9996666666666667
1091
+ },
1092
+ "Qwen2-Audio-7B-Instruct": {
1093
+ "judge_score": 68.38333333333333,
1094
+ "success_rate": 0.9996666666666667
1095
+ },
1096
+ "SALMONN_7B": {
1097
+ "judge_score": 59.766666666666666,
1098
+ "success_rate": 1.0
1099
+ },
1100
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1101
+ "judge_score": 66.13333333333333,
1102
+ "success_rate": 1.0
1103
+ },
1104
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1105
+ "judge_score": 36.016666666666666,
1106
+ "success_rate": 1.0
1107
+ }
1108
+ }
1109
+ },
1110
+ "imda_part4_30s_ds_test": {
1111
+ "llama3_70b_judge": {
1112
+ "Qwen-Audio-Chat": {
1113
+ "judge_score": 18.060000000000002,
1114
+ "success_rate": 0.994
1115
+ },
1116
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1117
+ "judge_score": 43.4,
1118
+ "success_rate": 0.999
1119
+ },
1120
+ "Qwen2-Audio-7B-Instruct": {
1121
+ "judge_score": 25.019999999999996,
1122
+ "success_rate": 0.998
1123
+ },
1124
+ "SALMONN_7B": {
1125
+ "judge_score": 9.399999999999999,
1126
+ "success_rate": 0.999
1127
+ },
1128
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1129
+ "judge_score": 37.879999999999995,
1130
+ "success_rate": 0.993
1131
+ }
1132
+ },
1133
+ "gpt4o_judge": {
1134
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1135
+ "judge_score": 47.74,
1136
+ "success_rate": 0.999
1137
+ }
1138
+ }
1139
+ },
1140
+ "meld_emotion_test": {
1141
+ "llama3_70b_judge": {
1142
+ "Qwen-Audio-Chat": {
1143
+ "judge_score": 50.72796934865901,
1144
+ "success_rate": 1.0
1145
+ },
1146
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1147
+ "judge_score": 47.356321839080465,
1148
+ "success_rate": 1.0
1149
+ },
1150
+ "WavLLM_fairseq": {
1151
+ "judge_score": 41.57088122605364,
1152
+ "success_rate": 1.0
1153
+ },
1154
+ "Qwen2-Audio-7B-Instruct": {
1155
+ "judge_score": 41.60919540229885,
1156
+ "success_rate": 1.0
1157
+ },
1158
+ "SALMONN_7B": {
1159
+ "judge_score": 30.536398467432953,
1160
+ "success_rate": 1.0
1161
+ },
1162
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1163
+ "judge_score": 36.36015325670498,
1164
+ "success_rate": 1.0
1165
+ },
1166
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1167
+ "judge_score": 36.81992337164751,
1168
+ "success_rate": 1.0
1169
+ }
1170
+ }
1171
+ },
1172
+ "muchomusic_test": {
1173
+ "llama3_70b_judge": {
1174
+ "Qwen-Audio-Chat": {
1175
+ "judge_score": 59.0564448188711,
1176
+ "success_rate": 0.9991575400168492
1177
+ },
1178
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1179
+ "judge_score": 51.727042965459134,
1180
+ "success_rate": 1.0
1181
+ },
1182
+ "WavLLM_fairseq": {
1183
+ "judge_score": 44.3133951137321,
1184
+ "success_rate": 1.0
1185
+ },
1186
+ "Qwen2-Audio-7B-Instruct": {
1187
+ "judge_score": 71.60909856781802,
1188
+ "success_rate": 1.0
1189
+ },
1190
+ "SALMONN_7B": {
1191
+ "judge_score": 50.88458298230834,
1192
+ "success_rate": 1.0
1193
+ },
1194
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1195
+ "judge_score": 57.7927548441449,
1196
+ "success_rate": 1.0
1197
+ },
1198
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1199
+ "judge_score": 56.44481887110362,
1200
+ "success_rate": 1.0
1201
+ }
1202
+ }
1203
+ },
1204
+ "imda_part6_30s_ds_test": {
1205
+ "llama3_70b_judge": {
1206
+ "Qwen-Audio-Chat": {
1207
+ "judge_score": 43.84,
1208
+ "success_rate": 0.993
1209
+ },
1210
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1211
+ "judge_score": 65.6,
1212
+ "success_rate": 0.996
1213
+ },
1214
+ "Qwen2-Audio-7B-Instruct": {
1215
+ "judge_score": 48.38,
1216
+ "success_rate": 0.999
1217
+ },
1218
+ "SALMONN_7B": {
1219
+ "judge_score": 27.12,
1220
+ "success_rate": 1.0
1221
+ },
1222
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1223
+ "judge_score": 59.2,
1224
+ "success_rate": 0.999
1225
+ }
1226
+ },
1227
+ "gpt4o_judge": {
1228
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1229
+ "judge_score": 67.58,
1230
+ "success_rate": 1.0
1231
+ }
1232
+ }
1233
+ },
1234
+ "clotho_aqa_test": {
1235
+ "llama3_70b_judge": {
1236
+ "Qwen-Audio-Chat": {
1237
+ "judge_score": 61.934856587263,
1238
+ "success_rate": 1.0
1239
+ },
1240
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1241
+ "judge_score": 24.647544968400585,
1242
+ "success_rate": 1.0
1243
+ },
1244
+ "WavLLM_fairseq": {
1245
+ "judge_score": 43.01199466903598,
1246
+ "success_rate": 0.998223011994669
1247
+ },
1248
+ "Qwen2-Audio-7B-Instruct": {
1249
+ "judge_score": 50.919591292758774,
1250
+ "success_rate": 0.9991115059973346
1251
+ },
1252
+ "SALMONN_7B": {
1253
+ "judge_score": 57.75401069518716,
1254
+ "success_rate": 1.0
1255
+ },
1256
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1257
+ "judge_score": 63.15021876519203,
1258
+ "success_rate": 1.0
1259
+ },
1260
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1261
+ "judge_score": 29.47134606841404,
1262
+ "success_rate": 0.9991115059973346
1263
+ }
1264
+ },
1265
+ "gpt4o_judge": {
1266
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1267
+ "judge_score": 28.076410484229232,
1268
+ "success_rate": 1.0
1269
+ }
1270
+ }
1271
+ },
1272
+ "imda_part3_30s_sqa_human_test": {
1273
+ "llama3_70b_judge": {
1274
+ "Qwen-Audio-Chat": {
1275
+ "judge_score": 32.2,
1276
+ "success_rate": 1.0
1277
+ },
1278
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1279
+ "judge_score": 56.0,
1280
+ "success_rate": 1.0
1281
+ },
1282
+ "WavLLM_fairseq": {
1283
+ "judge_score": 45.199999999999996,
1284
+ "success_rate": 1.0
1285
+ },
1286
+ "Qwen2-Audio-7B-Instruct": {
1287
+ "judge_score": 42.0,
1288
+ "success_rate": 1.0
1289
+ },
1290
+ "SALMONN_7B": {
1291
+ "judge_score": 40.599999999999994,
1292
+ "success_rate": 1.0
1293
+ },
1294
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1295
+ "judge_score": 51.4,
1296
+ "success_rate": 1.0
1297
+ },
1298
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1299
+ "judge_score": 49.0,
1300
+ "success_rate": 1.0
1301
+ }
1302
+ },
1303
+ "gpt4o_judge": {
1304
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1305
+ "judge_score": 52.800000000000004,
1306
+ "success_rate": 1.0
1307
+ }
1308
+ }
1309
+ },
1310
+ "imda_part6_30s_sqa_test": {
1311
+ "llama3_70b_judge": {
1312
+ "Qwen-Audio-Chat": {
1313
+ "judge_score": 63.040000000000006,
1314
+ "success_rate": 0.998
1315
+ },
1316
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1317
+ "judge_score": 83.08,
1318
+ "success_rate": 1.0
1319
+ },
1320
+ "Qwen2-Audio-7B-Instruct": {
1321
+ "judge_score": 69.42,
1322
+ "success_rate": 0.998
1323
+ },
1324
+ "SALMONN_7B": {
1325
+ "judge_score": 66.86,
1326
+ "success_rate": 1.0
1327
+ },
1328
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1329
+ "judge_score": 80.60000000000001,
1330
+ "success_rate": 1.0
1331
+ }
1332
+ },
1333
+ "gpt4o_judge": {
1334
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1335
+ "judge_score": 81.8,
1336
+ "success_rate": 0.999
1337
+ }
1338
+ }
1339
+ },
1340
+ "imda_30s_ds_test": {
1341
+ "llama3_70b_judge": {
1342
+ "Qwen-Audio-Chat": {
1343
+ "judge_score": 31.295,
1344
+ "success_rate": 0.99625
1345
+ },
1346
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1347
+ "judge_score": 54.515,
1348
+ "success_rate": 0.99575
1349
+ },
1350
+ "Qwen2-Audio-7B-Instruct": {
1351
+ "judge_score": 38.915,
1352
+ "success_rate": 0.99775
1353
+ },
1354
+ "SALMONN_7B": {
1355
+ "judge_score": 18.345,
1356
+ "success_rate": 0.999
1357
+ },
1358
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1359
+ "judge_score": 48.269999999999996,
1360
+ "success_rate": 0.998
1361
+ }
1362
+ },
1363
+ "gpt4o_judge": {
1364
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1365
+ "judge_score": 57.99,
1366
+ "success_rate": 0.99975
1367
+ }
1368
+ }
1369
+ },
1370
+ "iemocap_emotion_test": {
1371
+ "llama3_70b_judge": {
1372
+ "Qwen-Audio-Chat": {
1373
+ "judge_score": 29.382470119521916,
1374
+ "success_rate": 1.0
1375
+ },
1376
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1377
+ "judge_score": 44.322709163346616,
1378
+ "success_rate": 0.99800796812749
1379
+ },
1380
+ "WavLLM_fairseq": {
1381
+ "judge_score": 59.76095617529881,
1382
+ "success_rate": 0.999003984063745
1383
+ },
1384
+ "Qwen2-Audio-7B-Instruct": {
1385
+ "judge_score": 53.98406374501992,
1386
+ "success_rate": 1.0
1387
+ },
1388
+ "SALMONN_7B": {
1389
+ "judge_score": 23.804780876494025,
1390
+ "success_rate": 1.0
1391
+ },
1392
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1393
+ "judge_score": 48.505976095617534,
1394
+ "success_rate": 1.0
1395
+ },
1396
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1397
+ "judge_score": 46.713147410358566,
1398
+ "success_rate": 1.0
1399
+ }
1400
+ }
1401
+ },
1402
+ "imda_part6_30s_ds_human_test": {
1403
+ "llama3_70b_judge": {
1404
+ "Qwen-Audio-Chat": {
1405
+ "judge_score": 40.4,
1406
+ "success_rate": 1.0
1407
+ },
1408
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1409
+ "judge_score": 65.4,
1410
+ "success_rate": 1.0
1411
+ },
1412
+ "WavLLM_fairseq": {
1413
+ "judge_score": 49.400000000000006,
1414
+ "success_rate": 1.0
1415
+ },
1416
+ "Qwen2-Audio-7B-Instruct": {
1417
+ "judge_score": 46.2,
1418
+ "success_rate": 1.0
1419
+ },
1420
+ "SALMONN_7B": {
1421
+ "judge_score": 24.2,
1422
+ "success_rate": 1.0
1423
+ },
1424
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1425
+ "judge_score": 62.599999999999994,
1426
+ "success_rate": 1.0
1427
+ },
1428
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1429
+ "judge_score": 57.199999999999996,
1430
+ "success_rate": 1.0
1431
+ }
1432
+ },
1433
+ "gpt4o_judge": {
1434
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1435
+ "judge_score": 64.4,
1436
+ "success_rate": 1.0
1437
+ }
1438
+ }
1439
+ },
1440
+ "imda_30s_sqa_test": {
1441
+ "llama3_70b_judge": {
1442
+ "Qwen-Audio-Chat": {
1443
+ "judge_score": 54.669999999999995,
1444
+ "success_rate": 0.99875
1445
+ },
1446
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1447
+ "judge_score": 75.09,
1448
+ "success_rate": 0.99875
1449
+ },
1450
+ "Qwen2-Audio-7B-Instruct": {
1451
+ "judge_score": 62.190000000000005,
1452
+ "success_rate": 0.99925
1453
+ },
1454
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1455
+ "judge_score": 72.475,
1456
+ "success_rate": 0.99925
1457
+ }
1458
+ },
1459
+ "gpt4o_judge": {
1460
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1461
+ "judge_score": 75.11999999999999,
1462
+ "success_rate": 0.9995
1463
+ }
1464
+ }
1465
+ },
1466
+ "wavcaps_qa_test": {
1467
+ "llama3_70b_judge": {
1468
+ "Qwen-Audio-Chat": {
1469
+ "judge_score": 42.69736842105263,
1470
+ "success_rate": 1.0
1471
+ },
1472
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1473
+ "judge_score": 18.88157894736842,
1474
+ "success_rate": 1.0
1475
+ },
1476
+ "WavLLM_fairseq": {
1477
+ "judge_score": 26.25,
1478
+ "success_rate": 0.9967105263157895
1479
+ },
1480
+ "Qwen2-Audio-7B-Instruct": {
1481
+ "judge_score": 44.473684210526315,
1482
+ "success_rate": 0.9967105263157895
1483
+ },
1484
+ "SALMONN_7B": {
1485
+ "judge_score": 47.30263157894737,
1486
+ "success_rate": 1.0
1487
+ },
1488
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1489
+ "judge_score": 46.31578947368421,
1490
+ "success_rate": 1.0
1491
+ },
1492
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1493
+ "judge_score": 16.710526315789473,
1494
+ "success_rate": 1.0
1495
+ }
1496
+ },
1497
+ "gpt4o_judge": {
1498
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1499
+ "judge_score": 14.736842105263158,
1500
+ "success_rate": 1.0
1501
+ }
1502
+ }
1503
+ },
1504
+ "wavcaps_test": {
1505
+ "meteor": {
1506
+ "Qwen-Audio-Chat": 0.2355106805560457,
1507
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": 0.120421856260385,
1508
+ "WavLLM_fairseq": 0.06399522524688675,
1509
+ "Qwen2-Audio-7B-Instruct": 0.21342294856199182,
1510
+ "SALMONN_7B": 0.17175112770658157,
1511
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": 0.3175511907248581,
1512
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.1388630786594543
1513
+ },
1514
+ "llama3_70b_judge": {
1515
+ "Qwen-Audio-Chat": {
1516
+ "judge_score": 32.9364161849711,
1517
+ "success_rate": 0.999421965317919
1518
+ },
1519
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1520
+ "judge_score": 6.3468208092485545,
1521
+ "success_rate": 1.0
1522
+ },
1523
+ "WavLLM_fairseq": {
1524
+ "judge_score": 6.901734104046243,
1525
+ "success_rate": 0.9976878612716763
1526
+ },
1527
+ "Qwen2-Audio-7B-Instruct": {
1528
+ "judge_score": 33.78034682080925,
1529
+ "success_rate": 0.9976878612716763
1530
+ },
1531
+ "SALMONN_7B": {
1532
+ "judge_score": 23.76878612716763,
1533
+ "success_rate": 0.999421965317919
1534
+ },
1535
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1536
+ "judge_score": 33.97687861271676,
1537
+ "success_rate": 0.999421965317919
1538
+ },
1539
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1540
+ "judge_score": 3.445086705202312,
1541
+ "success_rate": 0.9988439306358381
1542
+ }
1543
+ },
1544
+ "gpt4o_judge": {
1545
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1546
+ "judge_score": 4.61271676300578,
1547
+ "success_rate": 0.999421965317919
1548
+ }
1549
+ }
1550
+ },
1551
+ "imda_part3_30s_ds_test": {
1552
+ "llama3_70b_judge": {
1553
+ "Qwen-Audio-Chat": {
1554
+ "judge_score": 25.22,
1555
+ "success_rate": 0.997
1556
+ },
1557
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1558
+ "judge_score": 48.339999999999996,
1559
+ "success_rate": 0.998
1560
+ },
1561
+ "WavLLM_fairseq": {
1562
+ "judge_score": 36.5,
1563
+ "success_rate": 0.997
1564
+ },
1565
+ "Qwen2-Audio-7B-Instruct": {
1566
+ "judge_score": 35.54,
1567
+ "success_rate": 0.996
1568
+ },
1569
+ "SALMONN_7B": {
1570
+ "judge_score": 12.82,
1571
+ "success_rate": 0.998
1572
+ },
1573
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1574
+ "judge_score": 42.32,
1575
+ "success_rate": 0.998
1576
+ }
1577
+ },
1578
+ "gpt4o_judge": {
1579
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1580
+ "judge_score": 52.38,
1581
+ "success_rate": 1.0
1582
+ }
1583
+ }
1584
+ },
1585
+ "meld_sentiment_test": {
1586
+ "llama3_70b_judge": {
1587
+ "Qwen-Audio-Chat": {
1588
+ "judge_score": 44.90421455938697,
1589
+ "success_rate": 1.0
1590
+ },
1591
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1592
+ "judge_score": 56.59003831417625,
1593
+ "success_rate": 1.0
1594
+ },
1595
+ "WavLLM_fairseq": {
1596
+ "judge_score": 51.072796934865906,
1597
+ "success_rate": 0.9996168582375479
1598
+ },
1599
+ "Qwen2-Audio-7B-Instruct": {
1600
+ "judge_score": 53.9463601532567,
1601
+ "success_rate": 1.0
1602
+ },
1603
+ "SALMONN_7B": {
1604
+ "judge_score": 41.7624521072797,
1605
+ "success_rate": 0.9996168582375479
1606
+ },
1607
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1608
+ "judge_score": 46.206896551724135,
1609
+ "success_rate": 1.0
1610
+ },
1611
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1612
+ "judge_score": 45.593869731800766,
1613
+ "success_rate": 0.9996168582375479
1614
+ }
1615
+ }
1616
+ },
1617
+ "imda_part5_30s_ds_human_test": {
1618
+ "llama3_70b_judge": {
1619
+ "Qwen-Audio-Chat": {
1620
+ "judge_score": 28.2,
1621
+ "success_rate": 1.0
1622
+ },
1623
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1624
+ "judge_score": 58.0,
1625
+ "success_rate": 1.0
1626
+ },
1627
+ "WavLLM_fairseq": {
1628
+ "judge_score": 45.199999999999996,
1629
+ "success_rate": 1.0
1630
+ },
1631
+ "Qwen2-Audio-7B-Instruct": {
1632
+ "judge_score": 40.4,
1633
+ "success_rate": 1.0
1634
+ },
1635
+ "SALMONN_7B": {
1636
+ "judge_score": 17.2,
1637
+ "success_rate": 0.99
1638
+ },
1639
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1640
+ "judge_score": 57.0,
1641
+ "success_rate": 0.99
1642
+ },
1643
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1644
+ "judge_score": 49.0,
1645
+ "success_rate": 0.99
1646
+ }
1647
+ },
1648
+ "gpt4o_judge": {
1649
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1650
+ "judge_score": 56.8,
1651
+ "success_rate": 1.0
1652
+ }
1653
+ }
1654
+ },
1655
+ "imda_part5_30s_sqa_test": {
1656
+ "llama3_70b_judge": {
1657
+ "Qwen-Audio-Chat": {
1658
+ "judge_score": 61.260000000000005,
1659
+ "success_rate": 1.0
1660
+ },
1661
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1662
+ "judge_score": 80.34,
1663
+ "success_rate": 0.999
1664
+ },
1665
+ "Qwen2-Audio-7B-Instruct": {
1666
+ "judge_score": 68.52000000000001,
1667
+ "success_rate": 0.999
1668
+ },
1669
+ "SALMONN_7B": {
1670
+ "judge_score": 62.62,
1671
+ "success_rate": 1.0
1672
+ },
1673
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1674
+ "judge_score": 76.56,
1675
+ "success_rate": 1.0
1676
+ }
1677
+ },
1678
+ "gpt4o_judge": {
1679
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1680
+ "judge_score": 80.36,
1681
+ "success_rate": 1.0
1682
+ }
1683
+ }
1684
+ },
1685
+ "voxceleb_accent_test": {
1686
+ "llama3_70b_judge": {
1687
+ "Qwen-Audio-Chat": {
1688
+ "judge_score": 48.05088223225277,
1689
+ "success_rate": 0.9995896594173164
1690
+ },
1691
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1692
+ "judge_score": 24.640951990151827,
1693
+ "success_rate": 0.9997948297086582
1694
+ },
1695
+ "WavLLM_fairseq": {
1696
+ "judge_score": 39.96717275338531,
1697
+ "success_rate": 0.9993844891259746
1698
+ },
1699
+ "Qwen2-Audio-7B-Instruct": {
1700
+ "judge_score": 29.187525646286417,
1701
+ "success_rate": 1.0
1702
+ },
1703
+ "SALMONN_7B": {
1704
+ "judge_score": 34.222404595814524,
1705
+ "success_rate": 0.9993844891259746
1706
+ },
1707
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1708
+ "judge_score": 47.01682396389003,
1709
+ "success_rate": 0.9997948297086582
1710
+ },
1711
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1712
+ "judge_score": 39.32704144439885,
1713
+ "success_rate": 0.9993844891259746
1714
+ }
1715
+ },
1716
+ "gpt4o_judge": {
1717
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1718
+ "judge_score": 39.462453836684446,
1719
+ "success_rate": 1.0
1720
+ }
1721
+ }
1722
+ },
1723
+ "audiocaps_qa_test": {
1724
+ "llama3_70b_judge": {
1725
+ "Qwen-Audio-Chat": {
1726
+ "judge_score": 50.22364217252396,
1727
+ "success_rate": 1.0
1728
+ },
1729
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1730
+ "judge_score": 18.466453674121407,
1731
+ "success_rate": 1.0
1732
+ },
1733
+ "WavLLM_fairseq": {
1734
+ "judge_score": 29.840255591054312,
1735
+ "success_rate": 1.0
1736
+ },
1737
+ "Qwen2-Audio-7B-Instruct": {
1738
+ "judge_score": 45.75079872204473,
1739
+ "success_rate": 1.0
1740
+ },
1741
+ "SALMONN_7B": {
1742
+ "judge_score": 50.287539936102235,
1743
+ "success_rate": 1.0
1744
+ },
1745
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1746
+ "judge_score": 49.77635782747604,
1747
+ "success_rate": 1.0
1748
+ },
1749
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1750
+ "judge_score": 17.380191693290733,
1751
+ "success_rate": 1.0
1752
+ }
1753
+ },
1754
+ "gpt4o_judge": {
1755
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1756
+ "judge_score": 14.63258785942492,
1757
+ "success_rate": 1.0
1758
+ }
1759
+ }
1760
+ },
1761
+ "public_sg_speech_qa_test": {
1762
+ "llama3_70b_judge": {
1763
+ "Qwen-Audio-Chat": {
1764
+ "judge_score": 63.16860465116279,
1765
+ "success_rate": 0.9941860465116279
1766
+ },
1767
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1768
+ "judge_score": 73.11046511627907,
1769
+ "success_rate": 0.998546511627907
1770
+ },
1771
+ "WavLLM_fairseq": {
1772
+ "judge_score": 58.54651162790698,
1773
+ "success_rate": 0.9825581395348837
1774
+ },
1775
+ "Qwen2-Audio-7B-Instruct": {
1776
+ "judge_score": 58.31395348837209,
1777
+ "success_rate": 0.9927325581395349
1778
+ },
1779
+ "SALMONN_7B": {
1780
+ "judge_score": 59.24418604651163,
1781
+ "success_rate": 0.997093023255814
1782
+ },
1783
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1784
+ "judge_score": 59.7093023255814,
1785
+ "success_rate": 0.997093023255814
1786
+ },
1787
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1788
+ "judge_score": 64.94186046511628,
1789
+ "success_rate": 0.9927325581395349
1790
+ }
1791
+ },
1792
+ "gpt4o_judge": {
1793
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1794
+ "judge_score": 73.02325581395348,
1795
+ "success_rate": 1.0
1796
+ }
1797
+ }
1798
+ },
1799
+ "imda_30s_ds_human_test": {
1800
+ "llama3_70b_judge": {
1801
+ "Qwen-Audio-Chat": {
1802
+ "judge_score": 30.65,
1803
+ "success_rate": 0.995
1804
+ },
1805
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1806
+ "judge_score": 50.15,
1807
+ "success_rate": 0.9975
1808
+ },
1809
+ "Qwen2-Audio-7B-Instruct": {
1810
+ "judge_score": 37.599999999999994,
1811
+ "success_rate": 0.995
1812
+ },
1813
+ "SALMONN_7B": {
1814
+ "judge_score": 16.15,
1815
+ "success_rate": 0.9975
1816
+ },
1817
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1818
+ "judge_score": 43.849999999999994,
1819
+ "success_rate": 1.0
1820
+ }
1821
+ },
1822
+ "gpt4o_judge": {
1823
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1824
+ "judge_score": 54.65,
1825
+ "success_rate": 1.0
1826
+ }
1827
+ }
1828
+ },
1829
+ "alpaca_audio_test": {
1830
+ "llama3_70b_judge": {
1831
+ "Qwen-Audio-Chat": {
1832
+ "judge_score": 9.8,
1833
+ "success_rate": 1.0
1834
+ },
1835
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1836
+ "judge_score": 73.8,
1837
+ "success_rate": 1.0
1838
+ },
1839
+ "WavLLM_fairseq": {
1840
+ "judge_score": 21.6,
1841
+ "success_rate": 0.99
1842
+ },
1843
+ "Qwen2-Audio-7B-Instruct": {
1844
+ "judge_score": 52.599999999999994,
1845
+ "success_rate": 0.99
1846
+ },
1847
+ "SALMONN_7B": {
1848
+ "judge_score": 17.2,
1849
+ "success_rate": 1.0
1850
+ },
1851
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1852
+ "judge_score": 74.80000000000001,
1853
+ "success_rate": 0.99
1854
+ },
1855
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1856
+ "judge_score": 70.8,
1857
+ "success_rate": 0.96
1858
+ }
1859
+ },
1860
+ "gpt4o_judge": {
1861
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1862
+ "judge_score": 77.8,
1863
+ "success_rate": 1.0
1864
+ }
1865
+ }
1866
+ },
1867
+ "imda_30s_sqa_human_test": {
1868
+ "llama3_70b_judge": {
1869
+ "Qwen-Audio-Chat": {
1870
+ "judge_score": 42.199999999999996,
1871
+ "success_rate": 1.0
1872
+ },
1873
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1874
+ "judge_score": 62.95,
1875
+ "success_rate": 0.9975
1876
+ },
1877
+ "Qwen2-Audio-7B-Instruct": {
1878
+ "judge_score": 47.1,
1879
+ "success_rate": 0.995
1880
+ },
1881
+ "SALMONN_7B": {
1882
+ "judge_score": 42.300000000000004,
1883
+ "success_rate": 1.0
1884
+ },
1885
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1886
+ "judge_score": 55.7,
1887
+ "success_rate": 1.0
1888
+ }
1889
+ },
1890
+ "gpt4o_judge": {
1891
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1892
+ "judge_score": 61.550000000000004,
1893
+ "success_rate": 1.0
1894
+ }
1895
+ }
1896
+ },
1897
+ "imda_part4_30s_ds_human_test": {
1898
+ "llama3_70b_judge": {
1899
+ "Qwen-Audio-Chat": {
1900
+ "judge_score": 16.0,
1901
+ "success_rate": 0.99
1902
+ },
1903
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1904
+ "judge_score": 44.0,
1905
+ "success_rate": 1.0
1906
+ },
1907
+ "WavLLM_fairseq": {
1908
+ "judge_score": 31.6,
1909
+ "success_rate": 1.0
1910
+ },
1911
+ "Qwen2-Audio-7B-Instruct": {
1912
+ "judge_score": 24.8,
1913
+ "success_rate": 0.97
1914
+ },
1915
+ "SALMONN_7B": {
1916
+ "judge_score": 7.0,
1917
+ "success_rate": 0.99
1918
+ },
1919
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1920
+ "judge_score": 46.4,
1921
+ "success_rate": 1.0
1922
+ },
1923
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1924
+ "judge_score": 36.0,
1925
+ "success_rate": 0.99
1926
+ }
1927
+ },
1928
+ "gpt4o_judge": {
1929
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1930
+ "judge_score": 48.2,
1931
+ "success_rate": 1.0
1932
+ }
1933
+ }
1934
+ },
1935
+ "imda_part4_30s_sqa_human_test": {
1936
+ "llama3_70b_judge": {
1937
+ "Qwen-Audio-Chat": {
1938
+ "judge_score": 37.8,
1939
+ "success_rate": 1.0
1940
+ },
1941
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1942
+ "judge_score": 66.0,
1943
+ "success_rate": 1.0
1944
+ },
1945
+ "WavLLM_fairseq": {
1946
+ "judge_score": 46.6,
1947
+ "success_rate": 1.0
1948
+ },
1949
+ "Qwen2-Audio-7B-Instruct": {
1950
+ "judge_score": 39.6,
1951
+ "success_rate": 1.0
1952
+ },
1953
+ "SALMONN_7B": {
1954
+ "judge_score": 36.6,
1955
+ "success_rate": 1.0
1956
+ },
1957
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1958
+ "judge_score": 53.2,
1959
+ "success_rate": 1.0
1960
+ },
1961
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1962
+ "judge_score": 53.8,
1963
+ "success_rate": 1.0
1964
+ }
1965
+ },
1966
+ "gpt4o_judge": {
1967
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
1968
+ "judge_score": 61.4,
1969
+ "success_rate": 1.0
1970
+ }
1971
+ }
1972
+ },
1973
+ "iemocap_gender_test": {
1974
+ "llama3_70b_judge": {
1975
+ "Qwen-Audio-Chat": {
1976
+ "judge_score": 50.0996015936255,
1977
+ "success_rate": 1.0
1978
+ },
1979
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
1980
+ "judge_score": 15.737051792828685,
1981
+ "success_rate": 1.0
1982
+ },
1983
+ "WavLLM_fairseq": {
1984
+ "judge_score": 51.932270916334666,
1985
+ "success_rate": 1.0
1986
+ },
1987
+ "Qwen2-Audio-7B-Instruct": {
1988
+ "judge_score": 92.80876494023903,
1989
+ "success_rate": 1.0
1990
+ },
1991
+ "SALMONN_7B": {
1992
+ "judge_score": 81.31474103585658,
1993
+ "success_rate": 1.0
1994
+ },
1995
+ "MERaLiON-AudioLLM-Whisper-SEA-LION": {
1996
+ "judge_score": 93.48605577689243,
1997
+ "success_rate": 1.0
1998
+ },
1999
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
2000
+ "judge_score": 44.22310756972111,
2001
+ "success_rate": 1.0
2002
+ }
2003
+ }
2004
+ },
2005
+ "imda_30s_gr_test": {
2006
+ "llama3_70b_judge": {
2007
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
2008
+ "judge_score": 18.46666666666667,
2009
+ "success_rate": 1.0
2010
+ }
2011
+ }
2012
+ },
2013
+ "imda_30s_ar_test": {
2014
+ "llama3_70b_judge": {
2015
+ "cascade_whisper_large_v2_gemma2_9b_cpt_sea_lionv3_instruct": {
2016
+ "judge_score": 15.773333333333333,
2017
+ "success_rate": 0.9996666666666667
2018
+ },
2019
+ "Qwen2-Audio-7B-Instruct": {
2020
+ "judge_score": 5.106666666666667,
2021
+ "success_rate": 1.0
2022
+ },
2023
+ "SALMONN_7B": {
2024
+ "judge_score": 5.673333333333334,
2025
+ "success_rate": 1.0
2026
+ },
2027
+ "cascade_whisper_large_v3_llama_3_8b_instruct": {
2028
+ "judge_score": 27.186666666666667,
2029
+ "success_rate": 0.9996666666666667
2030
+ }
2031
+ }
2032
+ },
2033
+ "mmau_mini": {
2034
+ "llama3_70b_judge": {
2035
+ "phi_4_multimodal_instruct": {
2036
+ "judge_score": 59.4,
2037
+ "success_rate": 1.0
2038
+ }
2039
+ }
2040
+ },
2041
+ "nlb_asr_test": {
2042
+ "wer": {
2043
+ "cascade_whisper_large_v3_llama_3_8b_instruct": 0.2796380263880551
2044
+ }
2045
+ }
2046
+ }