|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# Omni-temporal Classification (OTC) |
|
We propose BTC/OTC to directly train an ASR system leveraging weak supervision, i.e., speech with non-verbatim transcripts. This is achieved by using a special token |
|
to model uncertainties (i.e., substitution errors, insertion errors, and deletion errors) within the WFST framework during training. |
|
|
|
OTC maintains reasonable ASR performance even when the transcripts contain up to 70% errors of different types. |
|
|
|
|
|
## When transcript error rate = 0.5 |
|
### Results (WER (%)) (ctc-greedy-search) |
|
<table> |
|
<tr> |
|
<td rowspan=2>Training Criterion</td> |
|
<td colspan=2>ssl</td> |
|
<td colspan=2>fbank</td> |
|
</tr> |
|
<tr> |
|
<td>test-clean</td> |
|
<td>test-other</td> |
|
<td>test-clean</td> |
|
<td>test-other</td> |
|
</tr> |
|
<tr> |
|
<td>CTC</td> |
|
<td>100.0</td> |
|
<td>100.0</td> |
|
<td>99.89</td> |
|
<td>99.98</td> |
|
</tr> |
|
<tr> |
|
<td>OTC</td> |
|
<td>11.89</td> |
|
<td>25.46</td> |
|
<td>20.14</td> |
|
<td>44.24</td> |
|
</tr> |
|
</table> |
|
|
|
### Results (WER (%)) (1best, blank_bias=-4) |
|
<table> |
|
<tr> |
|
<td rowspan=2>Training Criterion</td> |
|
<td colspan=2>ssl</td> |
|
<td colspan=2>fbank</td> |
|
</tr> |
|
<tr> |
|
<td>test-clean</td> |
|
<td>test-other</td> |
|
<td>test-clean</td> |
|
<td>test-other</td> |
|
</tr> |
|
<tr> |
|
<td>CTC</td> |
|
<td>98.40</td> |
|
<td>98.68</td> |
|
<td>99.79</td> |
|
<td>99.86</td> |
|
</tr> |
|
<tr> |
|
<td>OTC</td> |
|
<td>6.59</td> |
|
<td>15.98</td> |
|
<td>11.78</td> |
|
<td>32.38</td> |
|
</tr> |
|
</table> |