imedennikov
commited on
Commit
•
633d071
1
Parent(s):
3a2788c
Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ tags:
|
|
19 |
- NeMo
|
20 |
license: cc-by-4.0
|
21 |
model-index:
|
22 |
-
- name:
|
23 |
results:
|
24 |
- task:
|
25 |
name: Automatic Speech Recognition
|
@@ -108,7 +108,7 @@ img {
|
|
108 |
| [![Language](https://img.shields.io/badge/Language-ja-lightgrey#model-badge)](#datasets)
|
109 |
|
110 |
|
111 |
-
`
|
112 |
It is an XL version of Hybrid FastConformer [1] TDT-CTC [2] (around 0.6B parameters) model.
|
113 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
|
114 |
|
@@ -116,7 +116,7 @@ See the [model architecture](#model-architecture) section and [NeMo documentatio
|
|
116 |
|
117 |
To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
|
118 |
```
|
119 |
-
pip install nemo_toolkit['
|
120 |
```
|
121 |
|
122 |
## How to Use this Model
|
@@ -127,7 +127,7 @@ The model is available for use in the NeMo toolkit [3], and can be used as a pre
|
|
127 |
|
128 |
```python
|
129 |
import nemo.collections.asr as nemo_asr
|
130 |
-
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/
|
131 |
```
|
132 |
|
133 |
### Transcribing using Python
|
@@ -142,7 +142,7 @@ By default model uses TDT to transcribe the audio files, to switch decoder to us
|
|
142 |
|
143 |
```shell
|
144 |
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
|
145 |
-
pretrained_name="nvidia/
|
146 |
audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
|
147 |
```
|
148 |
|
@@ -160,7 +160,7 @@ This model uses a Hybrid FastConformer-TDT-CTC architecture.
|
|
160 |
|
161 |
FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
|
162 |
|
163 |
-
TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this
|
164 |
|
165 |
## Training
|
166 |
|
|
|
19 |
- NeMo
|
20 |
license: cc-by-4.0
|
21 |
model-index:
|
22 |
+
- name: parakeet-tdt_ctc-0.6b-ja
|
23 |
results:
|
24 |
- task:
|
25 |
name: Automatic Speech Recognition
|
|
|
108 |
| [![Language](https://img.shields.io/badge/Language-ja-lightgrey#model-badge)](#datasets)
|
109 |
|
110 |
|
111 |
+
`parakeet-tdt_ctc-0.6b-ja` is an ASR model that transcribes Japanese speech with Punctuations. This model is developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) team.
|
112 |
It is an XL version of Hybrid FastConformer [1] TDT-CTC [2] (around 0.6B parameters) model.
|
113 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
|
114 |
|
|
|
116 |
|
117 |
To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
|
118 |
```
|
119 |
+
pip install nemo_toolkit['asr']
|
120 |
```
|
121 |
|
122 |
## How to Use this Model
|
|
|
127 |
|
128 |
```python
|
129 |
import nemo.collections.asr as nemo_asr
|
130 |
+
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt_ctc-0.6b-ja")
|
131 |
```
|
132 |
|
133 |
### Transcribing using Python
|
|
|
142 |
|
143 |
```shell
|
144 |
python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
|
145 |
+
pretrained_name="nvidia/parakeet-tdt_ctc-0.6b-ja"
|
146 |
audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
|
147 |
```
|
148 |
|
|
|
160 |
|
161 |
FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
|
162 |
|
163 |
+
TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this `parakeet-tdt_ctc-0.6b-ja` model), thus brings significant inference speed-up. The detail of TDT can be found here: [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795).
|
164 |
|
165 |
## Training
|
166 |
|