csukuangfj commited on
Commit
2711a72
·
1 Parent(s): f96b09a

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Introduction
2
+
3
+ ## How to clone this repo
4
+ ```
5
+ sudo apt-get install git-lfs
6
+ git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-stateless-bpe-500-2022-02-07
7
+
8
+
9
+ cd icefall-asr-librispeech-transducer-stateless-bpe-500-2022-02-07
10
+ git lfs pull
11
+ ```
12
+
13
+ **Catuion**: You have to run `git lfs pull`. Otherwise, you will be SAD later.
14
+
15
+ The model in this repo is trained using the commit `TODO`.
16
+
17
+ You can use
18
+
19
+ ```
20
+ git clone https://github.com/k2-fsa/icefall
21
+ cd icefall
22
+ git checkout TODO
23
+ ```
24
+ to download `icefall`.
25
+
26
+ You can find the model information by visiting <https://github.com/k2-fsa/icefall/blob/TODO/egs/librispeech/ASR/transducer_stateless/train.py#L198>.
27
+
28
+ In short, the encoder is a Conformer model with 8 heads, 12 encoder layers, 512-dim attention, 2048-dim feedforward;
29
+ the decoder contains a 1024-dim embedding layer and a Conv1d with kernel size 2.
30
+
31
+ The decoder architecture is modified from
32
+ [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419).
33
+ A Conv1d layer is placed right after the input embedding layer.
34
+
35
+ -----
36
+
37
+ ## Description
38
+
39
+ This repo provides pre-trained transducer Conformer model for the LibriSpeech dataset
40
+ using [icefall][icefall]. There are no RNNs in the decoder. The decoder is stateless
41
+ and contains only an embedding layer and a Conv1d.
42
+
43
+ The commands for training are:
44
+
45
+ ```
46
+ cd egs/librispeech/ASR/
47
+ ./prepare.sh
48
+ export CUDA_VISIBLE_DEVICES="0,1,2,3"
49
+ ./transducer_stateless/train.py \
50
+ --world-size 4 \
51
+ --num-epochs 76 \
52
+ --start-epoch 0 \
53
+ --exp-dir transducer_stateless/exp-full \
54
+ --full-libri 1 \
55
+ --max-duration 300 \
56
+ --lr-factor 5 \
57
+ --bpe-model data/lang_bpe_500/bpe.model \
58
+ --modified-transducer-prob 0.25
59
+ ```
60
+
61
+ The tensorboard training log can be found at
62
+ <https://tensorboard.dev/experiment/qgvWkbF2R46FYA6ZMNmOjA/>
63
+
64
+ The command for decoding is:
65
+ ```
66
+ epoch=61
67
+ avg=18
68
+
69
+ ## greedy search
70
+ for sym in 1 2 3; do
71
+ ./transducer_stateless/decode.py \
72
+ --epoch $epoch \
73
+ --avg $avg \
74
+ --exp-dir transducer_stateless/exp-full \
75
+ --bpe-model ./data/lang_bpe_500/bpe.model \
76
+ --max-duration 100 \
77
+ --max-sym-per-frame $sym
78
+ done
79
+
80
+ ## modified beam search
81
+
82
+ ./transducer_stateless/decode.py \
83
+ --epoch $epoch \
84
+ --avg $avg \
85
+ --exp-dir transducer_stateless/exp-full \
86
+ --bpe-model ./data/lang_bpe_500/bpe.model \
87
+ --max-duration 100 \
88
+ --context-size 2 \
89
+ --decoding-method modified_beam_search \
90
+ --beam-size 4
91
+ ```
92
+
93
+ You can find the decoding log for the above command in this
94
+ repo (in the folder `log`).
95
+
96
+ The WERs for the test datasets are
97
+
98
+ | | test-clean | test-other | comment |
99
+ |-------------------------------------|------------|------------|------------------------------------------|
100
+ | greedy search (max sym per frame 1) | 2.68 | 6.71 | --epoch 61, --avg 18, --max-duration 100 |
101
+ | greedy search (max sym per frame 2) | 2.69 | 6.71 | --epoch 61, --avg 18, --max-duration 100 |
102
+ | greedy search (max sym per frame 3) | 2.69 | 6.71 | --epoch 61, --avg 18, --max-duration 100 |
103
+ | modified beam search (beam size 4) | 2.67 | 6.64 | --epoch 61, --avg 18, --max-duration 100 |
104
+
105
+
106
+ # File description
107
+
108
+ - [log][log], this directory contains the decoding log and decoding results
109
+ - [test_wavs][test_wavs], this directory contains wave files for testing the pre-trained model
110
+ - [data][data], this directory contains files generated by [prepare.sh][prepare]
111
+ - [exp][exp], this directory contains only one file: `preprained.pt`
112
+
113
+ `exp/pretrained.pt` is generated by the following command:
114
+ ```
115
+ ./transducer_stateless/export.py \
116
+ --epoch 61 \
117
+ --avg 18 \
118
+ --bpe-model data/lang_bpe_500/bpe.model \
119
+ --exp-dir transducer_stateless/exp-full
120
+ ```
121
+
122
+ **HINT**: To use `pretrained.pt` to compute the WER for test-clean and test-other,
123
+ just do the following:
124
+ ```
125
+ cp icefall-asr-librispeech-transducer-stateless-bpe-500-2022-02-07/exp/pretrained.pt \
126
+ /path/to/icefall/egs/librispeech/ASR/transducer_stateless/exp/epoch-999.pt
127
+ ```
128
+ and pass `--epoch 999 --avg 1` to `transducer_stateless/decode.py`.
129
+
130
+
131
+ [icefall]: https://github.com/k2-fsa/icefall
132
+ [prepare]: https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh
133
+ [exp]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-stateless-bpe-500-2022-02-07/tree/main/exp
134
+ [data]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-stateless-bpe-500-2022-02-07/tree/main/data
135
+ [test_wavs]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-stateless-bpe-500-2022-02-07/tree/main/test_wavs
136
+ [log]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-stateless-bpe-500-2022-02-07/tree/main/log
137
+ [icefall]: https://github.com/k2-fsa/icefall