File size: 1,520 Bytes
c869a18
 
 
4504e82
2e35381
110a602
5caef16
 
2e35381
 
 
37b9c63
 
5caef16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37b9c63
5caef16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: apache-2.0
---

# Omni-temporal Classification (OTC)
We propose BTC/OTC to directly train an ASR system leveraging weak supervision, i.e., speech with non-verbatim transcripts. This is achieved by using a special token 
 to model uncertainties (i.e., substitution errors, insertion errors, and deletion errors) within the WFST framework during training.

OTC maintains reasonable ASR performance even when the transcripts contain up to 70% errors of different types.


## When transcript error rate = 0.5 
 ### Results (WER (%)) (ctc-greedy-search)
<table>
  <tr>
    <td rowspan=2>Training Criterion</td>
    <td colspan=2>ssl</td>
    <td colspan=2>fbank</td>
  </tr>
  <tr>
    <td>test-clean</td>
    <td>test-other</td>
    <td>test-clean</td>
    <td>test-other</td>
  </tr>
  <tr>
    <td>CTC</td>
    <td>100.0</td>
    <td>100.0</td>
    <td>99.89</td>
    <td>99.98</td>
  </tr>
  <tr>
    <td>OTC</td>
    <td>11.89</td>
    <td>25.46</td>
    <td>20.14</td>
    <td>44.24</td>
  </tr>
</table>

### Results (WER (%)) (1best, blank_bias=-4)
<table>
  <tr>
    <td rowspan=2>Training Criterion</td>
    <td colspan=2>ssl</td>
    <td colspan=2>fbank</td>
  </tr>
  <tr>
    <td>test-clean</td>
    <td>test-other</td>
    <td>test-clean</td>
    <td>test-other</td>
  </tr>
  <tr>
    <td>CTC</td>
    <td>98.40</td>
    <td>98.68</td>
    <td>99.79</td>
    <td>99.86</td>
  </tr>
  <tr>
    <td>OTC</td>
    <td>6.59</td>
    <td>15.98</td>
    <td>11.78</td>
    <td>32.38</td>
  </tr>
</table>