Yingxu He commited on
Commit
99868b2
Β·
verified Β·
1 Parent(s): a6099b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -12
README.md CHANGED
@@ -57,6 +57,7 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
57
  > For other tasks, we employ the LLM-as-a-Judge framework,
58
  > which uses a pre-trained large language model to evaluate task performance
59
  > by generating and scoring responses based on criteria such as relevance, coherence, and accuracy.
 
60
 
61
  <div class="table*">
62
  <table>
@@ -73,8 +74,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
73
  </thead>
74
  <tbody>
75
  <tr>
76
- <td style="text-align: center;" rowspan="11"><em>Automatic Speech Recognition<br>WER (<span
77
- class="math inline">↓</span>)</em></td>
78
  <td style="text-align: center;">LibriSpeech-Test-Clean</td>
79
  <td style="text-align: center;">0.03</td>
80
  <td style="text-align: center;">0.03</td>
@@ -163,8 +164,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
163
  <td style="text-align: center;">0.18</td>
164
  </tr>
165
  <tr>
166
- <td style="text-align: center;" rowspan="6"><em>Speech Translation<br>BLEU (<span
167
- class="math inline">↑</span>)</em></td>
168
  <td style="text-align: center;">CoVoST 2 En <span
169
  class="math inline">β†’</span> Id</td>
170
  <td style="text-align: center;"><strong><u>32.62</u></strong></td>
@@ -219,8 +220,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
219
  <td style="text-align: center;">2.83</td>
220
  </tr>
221
  <tr>
222
- <td style="text-align: center;" rowspan="8"><em>Spoken Question Answering<br>LLM-as-a-Judge (<span
223
- class="math inline">↑</span>)</em></td>
224
  <td style="text-align: center;">SLUE-SQA-5</td>
225
  <td style="text-align: center;">82.94</td>
226
  <td style="text-align: center;">80.05</td>
@@ -285,8 +286,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
285
  <td style="text-align: center;"><u>71.60</u></td>
286
  </tr>
287
  <tr>
288
- <td style="text-align: center;" rowspan="4"><em>Spoken Dialogue Summarization<br>LLM-as-a-Judge (<span
289
- class="math inline">↑</span>)</em></td>
290
  <td style="text-align: center;">MNSC-SDS-Part 3</td>
291
  <td style="text-align: center;"><u><strong>46.80</strong></u></td>
292
  <td style="text-align: center;">33.80</td>
@@ -319,8 +320,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
319
  <td style="text-align: center;"><u>65.40</u></td>
320
  </tr>
321
  <tr>
322
- <td style="text-align: center;" rowspan="2"><em>Speech Instruction<br>LLM-as-a-Judge (<span
323
- class="math inline">↑</span>)</em></td>
324
  <td style="text-align: center;">OpenHermes-Audio</td>
325
  <td style="text-align: center;"><strong>71.4</strong></td>
326
  <td style="text-align: center;">44.8</td>
@@ -337,8 +338,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
337
  <td style="text-align: center;"><u>73.80</u></td>
338
  </tr>
339
  <tr>
340
- <td style="text-align: center;" rowspan="4"><em>Paralinguistics<br>LLM-as-a-Judge (<span
341
- class="math inline">↑</span>)</em></td>
342
  <td style="text-align: center;">VoxCeleb-Gender-Test</td>
343
  <td style="text-align: center;"><strong><u>99.53</u></strong></td>
344
  <td style="text-align: center;">99.12</td>
 
57
  > For other tasks, we employ the LLM-as-a-Judge framework,
58
  > which uses a pre-trained large language model to evaluate task performance
59
  > by generating and scoring responses based on criteria such as relevance, coherence, and accuracy.
60
+ > Refer to the [AudioBench paper](https://arxiv.org/abs/2406.16020) for more details.
61
 
62
  <div class="table*">
63
  <table>
 
74
  </thead>
75
  <tbody>
76
  <tr>
77
+ <td style="text-align: center;" rowspan="11"><strong>Automatic Speech Recognition</strong><br>WER (<span
78
+ class="math inline">↓</span>)</td>
79
  <td style="text-align: center;">LibriSpeech-Test-Clean</td>
80
  <td style="text-align: center;">0.03</td>
81
  <td style="text-align: center;">0.03</td>
 
164
  <td style="text-align: center;">0.18</td>
165
  </tr>
166
  <tr>
167
+ <td style="text-align: center;" rowspan="6"><strong>Speech Translation</strong><br>BLEU (<span
168
+ class="math inline">↑</span>)</td>
169
  <td style="text-align: center;">CoVoST 2 En <span
170
  class="math inline">β†’</span> Id</td>
171
  <td style="text-align: center;"><strong><u>32.62</u></strong></td>
 
220
  <td style="text-align: center;">2.83</td>
221
  </tr>
222
  <tr>
223
+ <td style="text-align: center;" rowspan="8"><strong>Spoken Question Answering</strong><br>LLM-as-a-Judge (<span
224
+ class="math inline">↑</span>)</td>
225
  <td style="text-align: center;">SLUE-SQA-5</td>
226
  <td style="text-align: center;">82.94</td>
227
  <td style="text-align: center;">80.05</td>
 
286
  <td style="text-align: center;"><u>71.60</u></td>
287
  </tr>
288
  <tr>
289
+ <td style="text-align: center;" rowspan="4"><strong>Spoken Dialogue Summarization</strong><br>LLM-as-a-Judge (<span
290
+ class="math inline">↑</span>)</td>
291
  <td style="text-align: center;">MNSC-SDS-Part 3</td>
292
  <td style="text-align: center;"><u><strong>46.80</strong></u></td>
293
  <td style="text-align: center;">33.80</td>
 
320
  <td style="text-align: center;"><u>65.40</u></td>
321
  </tr>
322
  <tr>
323
+ <td style="text-align: center;" rowspan="2"><strong>Speech Instruction</strong><br>LLM-as-a-Judge (<span
324
+ class="math inline">↑</span>)</td>
325
  <td style="text-align: center;">OpenHermes-Audio</td>
326
  <td style="text-align: center;"><strong>71.4</strong></td>
327
  <td style="text-align: center;">44.8</td>
 
338
  <td style="text-align: center;"><u>73.80</u></td>
339
  </tr>
340
  <tr>
341
+ <td style="text-align: center;" rowspan="4"><strong>Paralinguistics</strong><br>LLM-as-a-Judge (<span
342
+ class="math inline">↑</span>)</td>
343
  <td style="text-align: center;">VoxCeleb-Gender-Test</td>
344
  <td style="text-align: center;"><strong><u>99.53</u></strong></td>
345
  <td style="text-align: center;">99.12</td>