File size: 1,520 Bytes
c869a18 4504e82 2e35381 110a602 5caef16 2e35381 37b9c63 5caef16 37b9c63 5caef16 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
license: apache-2.0
---
# Omni-temporal Classification (OTC)
We propose BTC/OTC to directly train an ASR system leveraging weak supervision, i.e., speech with non-verbatim transcripts. This is achieved by using a special token
to model uncertainties (i.e., substitution errors, insertion errors, and deletion errors) within the WFST framework during training.
OTC maintains reasonable ASR performance even when the transcripts contain up to 70% errors of different types.
## When transcript error rate = 0.5
### Results (WER (%)) (ctc-greedy-search)
<table>
<tr>
<td rowspan=2>Training Criterion</td>
<td colspan=2>ssl</td>
<td colspan=2>fbank</td>
</tr>
<tr>
<td>test-clean</td>
<td>test-other</td>
<td>test-clean</td>
<td>test-other</td>
</tr>
<tr>
<td>CTC</td>
<td>100.0</td>
<td>100.0</td>
<td>99.89</td>
<td>99.98</td>
</tr>
<tr>
<td>OTC</td>
<td>11.89</td>
<td>25.46</td>
<td>20.14</td>
<td>44.24</td>
</tr>
</table>
### Results (WER (%)) (1best, blank_bias=-4)
<table>
<tr>
<td rowspan=2>Training Criterion</td>
<td colspan=2>ssl</td>
<td colspan=2>fbank</td>
</tr>
<tr>
<td>test-clean</td>
<td>test-other</td>
<td>test-clean</td>
<td>test-other</td>
</tr>
<tr>
<td>CTC</td>
<td>98.40</td>
<td>98.68</td>
<td>99.79</td>
<td>99.86</td>
</tr>
<tr>
<td>OTC</td>
<td>6.59</td>
<td>15.98</td>
<td>11.78</td>
<td>32.38</td>
</tr>
</table> |