nvidia
/

stt_eo_conformer_transducer_large

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Community

aandrusenko commited on Dec 12, 2022

Commit

c3ccb36

•

1 Parent(s): 30744db

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -98,9 +98,9 @@ Conformer-Transducer model is an autoregressive variant of Conformer model [1] f
 ## Training
-The NeMo toolkit [3] was used for finetuning from English SSL model for three hundred epochs. The model is finetuning with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_transducer_bpe.yaml). As pretrained English SSL model we use [ssl_en_conformer_large](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/ssl_en_conformer_large) which was trained using LibriLight corpus (~56k hrs of unlabeled English speech).
-The tokenizer (BPE vocab size 128) for the model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 Full config can be found inside the .nemo files.
@@ -118,7 +118,7 @@ The list of the available models in this collection is shown in the following ta
 | Version | Tokenizer             | Vocabulary Size | Dev WER| Test WER| Train Dataset   |
 |---------|-----------------------|-----------------|--------|---------|-----------------|
-| 1.14.0  | SentencePiece BPE     | 128             |  2.4   |    4.0  | MCV-11.0 Train set |
 ## Limitations

 ## Training
+The NeMo toolkit [3] was used for finetuning from English SSL model for over several hundred epochs. The model is finetuning with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_transducer_bpe.yaml). As pretrained English SSL model we use [ssl_en_conformer_large](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/ssl_en_conformer_large) which was trained using LibriLight corpus (~56k hrs of unlabeled English speech).
+The tokenizer for the model was built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 Full config can be found inside the .nemo files.
 | Version | Tokenizer             | Vocabulary Size | Dev WER| Test WER| Train Dataset   |
 |---------|-----------------------|-----------------|--------|---------|-----------------|
+| 1.14.0  | SentencePiece [2] BPE     | 128             |  2.4   |    4.0  | MCV-11.0 Train set |
 ## Limitations