smajumdar94
commited on
Commit
•
c0a65ce
1
Parent(s):
348ccea
Update README.md
Browse files
README.md
CHANGED
@@ -51,7 +51,7 @@ img {
|
|
51 |
|
52 |
This model transcribes speech into lowercase Catalan alphabet including spaces, dashes and apostrophes, and is trained on around 1023 hours of Catalan speech data.
|
53 |
It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
|
54 |
-
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-
|
55 |
|
56 |
|
57 |
## Usage
|
@@ -68,7 +68,7 @@ pip install nemo_toolkit['all']
|
|
68 |
|
69 |
```python
|
70 |
import nemo.collections.asr as nemo_asr
|
71 |
-
asr_model = nemo_asr.models.
|
72 |
```
|
73 |
|
74 |
### Transcribing using Python
|
@@ -114,8 +114,6 @@ The vocabulary we use contains 44 characters:
|
|
114 |
|
115 |
Full config can be found inside the .nemo files.
|
116 |
|
117 |
-
The checkpoint of the language model used as the neural rescorer can be found [here](https://ngc.nvidia.com/catalog/models/nvidia:nemo:asrlm_en_transformer_large_ls). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
|
118 |
-
|
119 |
### Datasets
|
120 |
|
121 |
All the models in this collection are trained on MCV-9.0 Catalan dataset, which contains around 1203 hours training, 28 hours of development and 27 hours of testing speech audios.
|
@@ -128,12 +126,6 @@ The list of the available models in this collection is shown in the following ta
|
|
128 |
|---------|-----------------------|-----------------|--------|---------|-----------------|
|
129 |
| 1.11.0 | SentencePiece Unigram | 128 |4.43 | 3.85 | MCV-9.0 Train set|
|
130 |
|
131 |
-
You may use language models (LMs) and beam search to improve the accuracy of the models, as reported in the follwoing table.
|
132 |
-
|
133 |
-
| Language Model | Test WER | Test WER w/ Oracle LM | Train Dataset | Settings |
|
134 |
-
|----------------|----------|-----------------------|------------------|-------------------------------------------------------|
|
135 |
-
| N-gram LM | 3.83 | 3.40 |MCV-9.0 Train set |N=6, beam_width=8, ngram_alpha=1, ngram_beta=0 |
|
136 |
-
|
137 |
|
138 |
## Limitations
|
139 |
|
|
|
51 |
|
52 |
This model transcribes speech into lowercase Catalan alphabet including spaces, dashes and apostrophes, and is trained on around 1023 hours of Catalan speech data.
|
53 |
It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
|
54 |
+
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
|
55 |
|
56 |
|
57 |
## Usage
|
|
|
68 |
|
69 |
```python
|
70 |
import nemo.collections.asr as nemo_asr
|
71 |
+
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained("nvidia/stt_ca_conformer_transducer_large")
|
72 |
```
|
73 |
|
74 |
### Transcribing using Python
|
|
|
114 |
|
115 |
Full config can be found inside the .nemo files.
|
116 |
|
|
|
|
|
117 |
### Datasets
|
118 |
|
119 |
All the models in this collection are trained on MCV-9.0 Catalan dataset, which contains around 1203 hours training, 28 hours of development and 27 hours of testing speech audios.
|
|
|
126 |
|---------|-----------------------|-----------------|--------|---------|-----------------|
|
127 |
| 1.11.0 | SentencePiece Unigram | 128 |4.43 | 3.85 | MCV-9.0 Train set|
|
128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
129 |
|
130 |
## Limitations
|
131 |
|