Automatic Speech Recognition
NeMo
PyTorch
Italian
speech
audio
Transducer
Conformer
Transformer
NeMo
hf-asr-leaderboard
Eval Results
igitman commited on
Commit
3a5de75
·
1 Parent(s): 3ba4e03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -185,9 +185,9 @@ All the models in this collection are trained on a composite dataset (NeMo ASRSE
185
 
186
  The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
187
 
188
- | Version | Tokenizer | Vocabulary Size | MCV 11.0 Dev | MCV 11.0 Test | MLS Dev | MLS Test | VoxPopuli Dev | VoxPopuli Test | Train Dataset |
189
- |---------|-----------------------|-----------------|--------------|---------------|---------|----------|---------------|----------------|-----------------|
190
- | 1.13.0 | SentencePiece Unigram | 1024 | 4.80 | 5.24 | 14.62 | 12.18 | 12.00 | 15.15 | NeMo ASRSET 2.0 |
191
 
192
  ## Limitations
193
  Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
 
185
 
186
  The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
187
 
188
+ | Version | Tokenizer | Vocabulary Size | MCV 11.0 Dev | MCV 11.0 Test | MLS Dev | MLS Test | VoxPopuli Dev | VoxPopuli Test | Train Dataset |
189
+ |---------|-----------------------|-----------------|--------------|---------------|---------|----------|---------------|----------------|--------------------|
190
+ | 1.13.0 | SentencePiece Unigram | 1024 | 4.80 | 5.24 | 14.62 | 12.18 | 12.00 | 15.15 | NeMo ASRSET It 2.0 |
191
 
192
  ## Limitations
193
  Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.