nvidia
/

stt_ca_conformer_transducer_large

@@ -51,7 +51,7 @@ img {
 This model transcribes speech into lowercase Catalan alphabet including spaces, dashes and apostrophes, and is trained on around 1023 hours of Catalan speech data.
 It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
-See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
 ## Usage
@@ -68,7 +68,7 @@ pip install nemo_toolkit['all']
 ```python
 import nemo.collections.asr as nemo_asr
-asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained("nvidia/stt_ca_conformer_transducer_large")
 ```
 ### Transcribing using Python
@@ -114,8 +114,6 @@ The vocabulary we use contains 44 characters:
 Full config can be found inside the .nemo files.
-The checkpoint of the language model used as the neural rescorer can be found [here](https://ngc.nvidia.com/catalog/models/nvidia:nemo:asrlm_en_transformer_large_ls). You may find more info on how to train and use language models for ASR models here: [ASR Language Modeling](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html)
 ### Datasets
 All the models in this collection are trained on MCV-9.0 Catalan dataset, which contains around 1203 hours training, 28 hours of development and 27 hours of testing speech audios.
@@ -128,12 +126,6 @@ The list of the available models in this collection is shown in the following ta
 |---------|-----------------------|-----------------|--------|---------|-----------------|
 | 1.11.0  | SentencePiece Unigram | 128             |4.43    | 3.85    | MCV-9.0 Train set|
-You may use language models (LMs) and beam search to improve the accuracy of the models, as reported in the follwoing table.
-| Language Model | Test WER | Test WER w/ Oracle LM | Train Dataset    | Settings                                              |
-|----------------|----------|-----------------------|------------------|-------------------------------------------------------|
-| N-gram LM      |     3.83 |        3.40           |MCV-9.0 Train set |N=6, beam_width=8, ngram_alpha=1, ngram_beta=0   |
 ## Limitations

 This model transcribes speech into lowercase Catalan alphabet including spaces, dashes and apostrophes, and is trained on around 1023 hours of Catalan speech data.
 It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters.
+See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
 ## Usage
 ```python
 import nemo.collections.asr as nemo_asr
+asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained("nvidia/stt_ca_conformer_transducer_large")
 ```
 ### Transcribing using Python
 Full config can be found inside the .nemo files.
 ### Datasets
 All the models in this collection are trained on MCV-9.0 Catalan dataset, which contains around 1203 hours training, 28 hours of development and 27 hours of testing speech audios.
 |---------|-----------------------|-----------------|--------|---------|-----------------|
 | 1.11.0  | SentencePiece Unigram | 128             |4.43    | 3.85    | MCV-9.0 Train set|
 ## Limitations