transiteration
/

stt_kz_quartznet15x5

Automatic Speech Recognition

Model card Files Files and versions Community

transiteration commited on Sep 6, 2023

Commit

63f1a81

·

1 Parent(s): 445ff6d

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ tags:
 In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].
 We advise installing it once you've installed the most recent version of PyTorch.
-This model is trained on NVIDIA GeForce RTX 2070:
 Python 3.7.15\
 NumPy 1.21.6\
 PyTorch 1.21.1\
@@ -53,7 +53,7 @@ python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manife
 ## Input and Output
-This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.
 Then, this model gives you the spoken words in a text format for a given audio sample.
 ## Model Architecture
@@ -72,8 +72,8 @@ Average WER: 15.53%
 ## Limitations
-Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.
-In general, this makes it faster for inference but might show less overall performance.
 In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
 ## References

 In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].
 We advise installing it once you've installed the most recent version of PyTorch.
+This model is trained on NVIDIA GeForce RTX 2070:\
 Python 3.7.15\
 NumPy 1.21.6\
 PyTorch 1.21.1\
 ## Input and Output
+This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
 Then, this model gives you the spoken words in a text format for a given audio sample.
 ## Model Architecture
 ## Limitations
+Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
+In general, this makes it faster for inference but might show less overall performance.\
 In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
 ## References