smajumdar94 commited on
Commit
c0a65ce
1 Parent(s): 348ccea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -10
README.md CHANGED
@@ -51,7 +51,7 @@ img {
51
 
52
  This model transcribes speech into lowercase Catalan alphabet including spaces, dashes and apostrophes, and is trained on around 1023 hours of Catalan speech data.
53
  It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
54
- See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
55
 
56
 
57
  ## Usage
@@ -68,7 +68,7 @@ pip install nemo_toolkit['all']
68
 
69
  ```python
70
  import nemo.collections.asr as nemo_asr
71
- asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained("nvidia/stt_ca_conformer_transducer_large")
72
  ```
73
 
74
  ### Transcribing using Python
@@ -114,8 +114,6 @@ The vocabulary we use contains 44 characters:
114
 
115
  Full config can be found inside the .nemo files.
116
 
117
- The checkpoint of the language model used as the neural rescorer can be found [here](https://ngc.nvidia.com/catalog/models/nvidia:nemo:asrlm_en_transformer_large_ls). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
118
-
119
  ### Datasets
120
 
121
  All the models in this collection are trained on MCV-9.0 Catalan dataset, which contains around 1203 hours training, 28 hours of development and 27 hours of testing speech audios.
@@ -128,12 +126,6 @@ The list of the available models in this collection is shown in the following ta
128
  |---------|-----------------------|-----------------|--------|---------|-----------------|
129
  | 1.11.0 | SentencePiece Unigram | 128 |4.43 | 3.85 | MCV-9.0 Train set|
130
 
131
- You may use language models (LMs) and beam search to improve the accuracy of the models, as reported in the follwoing table.
132
-
133
- | Language Model | Test WER | Test WER w/ Oracle LM | Train Dataset | Settings |
134
- |----------------|----------|-----------------------|------------------|-------------------------------------------------------|
135
- | N-gram LM | 3.83 | 3.40 |MCV-9.0 Train set |N=6, beam_width=8, ngram_alpha=1, ngram_beta=0 |
136
-
137
 
138
  ## Limitations
139
 
 
51
 
52
  This model transcribes speech into lowercase Catalan alphabet including spaces, dashes and apostrophes, and is trained on around 1023 hours of Catalan speech data.
53
  It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
54
+ See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
55
 
56
 
57
  ## Usage
 
68
 
69
  ```python
70
  import nemo.collections.asr as nemo_asr
71
+ asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained("nvidia/stt_ca_conformer_transducer_large")
72
  ```
73
 
74
  ### Transcribing using Python
 
114
 
115
  Full config can be found inside the .nemo files.
116
 
 
 
117
  ### Datasets
118
 
119
  All the models in this collection are trained on MCV-9.0 Catalan dataset, which contains around 1203 hours training, 28 hours of development and 27 hours of testing speech audios.
 
126
  |---------|-----------------------|-----------------|--------|---------|-----------------|
127
  | 1.11.0 | SentencePiece Unigram | 128 |4.43 | 3.85 | MCV-9.0 Train set|
128
 
 
 
 
 
 
 
129
 
130
  ## Limitations
131