smajumdar94
commited on
Commit
•
13f33b0
1
Parent(s):
e502137
Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,4 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
language:
|
6 |
- be
|
7 |
library_name: nemo
|
@@ -78,8 +75,9 @@ asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained("nvidia/stt_be_con
|
|
78 |
```
|
79 |
|
80 |
### Transcribing using Python
|
81 |
-
|
82 |
Simply do:
|
|
|
83 |
```
|
84 |
asr_model.transcribe(['sample.wav'])
|
85 |
```
|
@@ -88,7 +86,7 @@ asr_model.transcribe(['sample.wav'])
|
|
88 |
|
89 |
```shell
|
90 |
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
|
91 |
-
pretrained_name="nvidia/
|
92 |
audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
|
93 |
```
|
94 |
|
@@ -120,11 +118,15 @@ All the models in this collection are trained on a composite dataset (NeMo ASRSE
|
|
120 |
|
121 |
## Performance
|
122 |
|
123 |
-
Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
|
|
|
|
|
|
|
|
|
124 |
|
125 |
## Limitations
|
126 |
|
127 |
-
Since all models are trained on just
|
128 |
|
129 |
## Deployment with NVIDIA Riva
|
130 |
|
|
|
1 |
---
|
|
|
|
|
|
|
2 |
language:
|
3 |
- be
|
4 |
library_name: nemo
|
|
|
75 |
```
|
76 |
|
77 |
### Transcribing using Python
|
78 |
+
|
79 |
Simply do:
|
80 |
+
|
81 |
```
|
82 |
asr_model.transcribe(['sample.wav'])
|
83 |
```
|
|
|
86 |
|
87 |
```shell
|
88 |
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
|
89 |
+
pretrained_name="nvidia/stt_be_conformer_ctc_large"
|
90 |
audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
|
91 |
```
|
92 |
|
|
|
118 |
|
119 |
## Performance
|
120 |
|
121 |
+
Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
|
122 |
+
|
123 |
+
| Version | Tokenizer | Vocabulary Size | MCV 10 Test | Train Dataset |
|
124 |
+
|---------|----------------------|-----------------|-------------|---------------|
|
125 |
+
| 1.12.0 | Google Sentencepiece | 1024 | 4.8 | MCV 10 |
|
126 |
|
127 |
## Limitations
|
128 |
|
129 |
+
Since all models are trained on just academic datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|
130 |
|
131 |
## Deployment with NVIDIA Riva
|
132 |
|