Yingxu He
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -57,6 +57,7 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
|
|
57 |
> For other tasks, we employ the LLM-as-a-Judge framework,
|
58 |
> which uses a pre-trained large language model to evaluate task performance
|
59 |
> by generating and scoring responses based on criteria such as relevance, coherence, and accuracy.
|
|
|
60 |
|
61 |
<div class="table*">
|
62 |
<table>
|
@@ -73,8 +74,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
|
|
73 |
</thead>
|
74 |
<tbody>
|
75 |
<tr>
|
76 |
-
<td style="text-align: center;" rowspan="11"><
|
77 |
-
class="math inline">β</span>)</
|
78 |
<td style="text-align: center;">LibriSpeech-Test-Clean</td>
|
79 |
<td style="text-align: center;">0.03</td>
|
80 |
<td style="text-align: center;">0.03</td>
|
@@ -163,8 +164,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
|
|
163 |
<td style="text-align: center;">0.18</td>
|
164 |
</tr>
|
165 |
<tr>
|
166 |
-
<td style="text-align: center;" rowspan="6"><
|
167 |
-
class="math inline">β</span>)</
|
168 |
<td style="text-align: center;">CoVoST 2 En <span
|
169 |
class="math inline">β</span> Id</td>
|
170 |
<td style="text-align: center;"><strong><u>32.62</u></strong></td>
|
@@ -219,8 +220,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
|
|
219 |
<td style="text-align: center;">2.83</td>
|
220 |
</tr>
|
221 |
<tr>
|
222 |
-
<td style="text-align: center;" rowspan="8"><
|
223 |
-
class="math inline">β</span>)</
|
224 |
<td style="text-align: center;">SLUE-SQA-5</td>
|
225 |
<td style="text-align: center;">82.94</td>
|
226 |
<td style="text-align: center;">80.05</td>
|
@@ -285,8 +286,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
|
|
285 |
<td style="text-align: center;"><u>71.60</u></td>
|
286 |
</tr>
|
287 |
<tr>
|
288 |
-
<td style="text-align: center;" rowspan="4"><
|
289 |
-
class="math inline">β</span>)</
|
290 |
<td style="text-align: center;">MNSC-SDS-Part 3</td>
|
291 |
<td style="text-align: center;"><u><strong>46.80</strong></u></td>
|
292 |
<td style="text-align: center;">33.80</td>
|
@@ -319,8 +320,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
|
|
319 |
<td style="text-align: center;"><u>65.40</u></td>
|
320 |
</tr>
|
321 |
<tr>
|
322 |
-
<td style="text-align: center;" rowspan="2"><
|
323 |
-
class="math inline">β</span>)</
|
324 |
<td style="text-align: center;">OpenHermes-Audio</td>
|
325 |
<td style="text-align: center;"><strong>71.4</strong></td>
|
326 |
<td style="text-align: center;">44.8</td>
|
@@ -337,8 +338,8 @@ as evidenced by evaluation results on Singapore's [Multitask National Speech Cor
|
|
337 |
<td style="text-align: center;"><u>73.80</u></td>
|
338 |
</tr>
|
339 |
<tr>
|
340 |
-
<td style="text-align: center;" rowspan="4"><
|
341 |
-
class="math inline">β</span>)</
|
342 |
<td style="text-align: center;">VoxCeleb-Gender-Test</td>
|
343 |
<td style="text-align: center;"><strong><u>99.53</u></strong></td>
|
344 |
<td style="text-align: center;">99.12</td>
|
|
|
57 |
> For other tasks, we employ the LLM-as-a-Judge framework,
|
58 |
> which uses a pre-trained large language model to evaluate task performance
|
59 |
> by generating and scoring responses based on criteria such as relevance, coherence, and accuracy.
|
60 |
+
> Refer to the [AudioBench paper](https://arxiv.org/abs/2406.16020) for more details.
|
61 |
|
62 |
<div class="table*">
|
63 |
<table>
|
|
|
74 |
</thead>
|
75 |
<tbody>
|
76 |
<tr>
|
77 |
+
<td style="text-align: center;" rowspan="11"><strong>Automatic Speech Recognition</strong><br>WER (<span
|
78 |
+
class="math inline">β</span>)</td>
|
79 |
<td style="text-align: center;">LibriSpeech-Test-Clean</td>
|
80 |
<td style="text-align: center;">0.03</td>
|
81 |
<td style="text-align: center;">0.03</td>
|
|
|
164 |
<td style="text-align: center;">0.18</td>
|
165 |
</tr>
|
166 |
<tr>
|
167 |
+
<td style="text-align: center;" rowspan="6"><strong>Speech Translation</strong><br>BLEU (<span
|
168 |
+
class="math inline">β</span>)</td>
|
169 |
<td style="text-align: center;">CoVoST 2 En <span
|
170 |
class="math inline">β</span> Id</td>
|
171 |
<td style="text-align: center;"><strong><u>32.62</u></strong></td>
|
|
|
220 |
<td style="text-align: center;">2.83</td>
|
221 |
</tr>
|
222 |
<tr>
|
223 |
+
<td style="text-align: center;" rowspan="8"><strong>Spoken Question Answering</strong><br>LLM-as-a-Judge (<span
|
224 |
+
class="math inline">β</span>)</td>
|
225 |
<td style="text-align: center;">SLUE-SQA-5</td>
|
226 |
<td style="text-align: center;">82.94</td>
|
227 |
<td style="text-align: center;">80.05</td>
|
|
|
286 |
<td style="text-align: center;"><u>71.60</u></td>
|
287 |
</tr>
|
288 |
<tr>
|
289 |
+
<td style="text-align: center;" rowspan="4"><strong>Spoken Dialogue Summarization</strong><br>LLM-as-a-Judge (<span
|
290 |
+
class="math inline">β</span>)</td>
|
291 |
<td style="text-align: center;">MNSC-SDS-Part 3</td>
|
292 |
<td style="text-align: center;"><u><strong>46.80</strong></u></td>
|
293 |
<td style="text-align: center;">33.80</td>
|
|
|
320 |
<td style="text-align: center;"><u>65.40</u></td>
|
321 |
</tr>
|
322 |
<tr>
|
323 |
+
<td style="text-align: center;" rowspan="2"><strong>Speech Instruction</strong><br>LLM-as-a-Judge (<span
|
324 |
+
class="math inline">β</span>)</td>
|
325 |
<td style="text-align: center;">OpenHermes-Audio</td>
|
326 |
<td style="text-align: center;"><strong>71.4</strong></td>
|
327 |
<td style="text-align: center;">44.8</td>
|
|
|
338 |
<td style="text-align: center;"><u>73.80</u></td>
|
339 |
</tr>
|
340 |
<tr>
|
341 |
+
<td style="text-align: center;" rowspan="4"><strong>Paralinguistics</strong><br>LLM-as-a-Judge (<span
|
342 |
+
class="math inline">β</span>)</td>
|
343 |
<td style="text-align: center;">VoxCeleb-Gender-Test</td>
|
344 |
<td style="text-align: center;"><strong><u>99.53</u></strong></td>
|
345 |
<td style="text-align: center;">99.12</td>
|