aandrusenko
commited on
Commit
•
c3ccb36
1
Parent(s):
30744db
Update README.md
Browse files
README.md
CHANGED
@@ -98,9 +98,9 @@ Conformer-Transducer model is an autoregressive variant of Conformer model [1] f
|
|
98 |
|
99 |
## Training
|
100 |
|
101 |
-
The NeMo toolkit [3] was used for finetuning from English SSL model for
|
102 |
|
103 |
-
The tokenizer
|
104 |
|
105 |
Full config can be found inside the .nemo files.
|
106 |
|
@@ -118,7 +118,7 @@ The list of the available models in this collection is shown in the following ta
|
|
118 |
|
119 |
| Version | Tokenizer | Vocabulary Size | Dev WER| Test WER| Train Dataset |
|
120 |
|---------|-----------------------|-----------------|--------|---------|-----------------|
|
121 |
-
| 1.14.0 | SentencePiece BPE | 128 | 2.4 | 4.0 | MCV-11.0 Train set |
|
122 |
|
123 |
|
124 |
## Limitations
|
|
|
98 |
|
99 |
## Training
|
100 |
|
101 |
+
The NeMo toolkit [3] was used for finetuning from English SSL model for over several hundred epochs. The model is finetuning with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_transducer_bpe.yaml). As pretrained English SSL model we use [ssl_en_conformer_large](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/ssl_en_conformer_large) which was trained using LibriLight corpus (~56k hrs of unlabeled English speech).
|
102 |
|
103 |
+
The tokenizer for the model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
104 |
|
105 |
Full config can be found inside the .nemo files.
|
106 |
|
|
|
118 |
|
119 |
| Version | Tokenizer | Vocabulary Size | Dev WER| Test WER| Train Dataset |
|
120 |
|---------|-----------------------|-----------------|--------|---------|-----------------|
|
121 |
+
| 1.14.0 | SentencePiece [2] BPE | 128 | 2.4 | 4.0 | MCV-11.0 Train set |
|
122 |
|
123 |
|
124 |
## Limitations
|