transiteration
/

stt_kz_quartznet15x5

Automatic Speech Recognition

Model card Files Files and versions Community

transiteration commited on Jan 19, 2024

Commit

8ea3efa

·

verified ·

1 Parent(s): 011f320

Update README.md

Files changed (1) hide show

README.md +19 -12

README.md CHANGED Viewed

@@ -34,27 +34,33 @@ pip install nemo_toolkit['all']
 The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.
 #### How to Import
 ```
 import nemo.collections.asr as nemo_asr
-asr_model = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="stt_kz_quartznet15x5.nemo")
 ```
-#### How to Transcribe Single Audio File
-We can get a sample audio to test the model:
 ```
-wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
 ```
-Then this line of code is to transcribe the single audio:
 ```
-asr_model.transcribe(['sample_kz.wav'])
 ```
-#### How to Transcribe Multiple Audio Files
 ```
-python transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
 ```
-If you have a manifest file about your audio files:
 ```
-python transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manifest=manifest.json
 ```
 ## Input and Output
@@ -74,8 +80,9 @@ In total, KSC2 contains around 1.2k hours of high-quality transcribed data compr
 ## Performance
 The model achieved:\
-Average WER: 15.53%\
 through the applying of **Greedy Decoding**.
 ## Limitations
 Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\

 The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.
 #### How to Import
 ```
 import nemo.collections.asr as nemo_asr
+model = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="stt_kz_quartznet15x5.nemo")
 ```
+#### How to Train
 ```
+python3 train.py --train_manifest path/to/manifest.json --val_manifest path/to/manifest.json --batch_size BATCH_SIZE --num_epochs NUM_EPOCHS  --model_save_path path/to/save/model.nemo
 ```
+#### How to Evaluate
 ```
+python3 evaluate.py --model_path=/path/to/stt_kz_quartznet15x5.nemo --test_manifest path/to/manifest.json"
 ```
+#### How to Transcribe Audio File
+We can get a sample audio to test the model:
 ```
+wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
 ```
+Then this line of code is to transcribe the single audio:
 ```
+python3 transcibe.py --model_path /path/to/stt_kz_quartznet15x5.nemo --audio_file_path path/to/audio/file
 ```
 ## Input and Output
 ## Performance
 The model achieved:\
+Average WER: 13.53%\
 through the applying of **Greedy Decoding**.
 ## Limitations
 Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\