transiteration commited on
Commit
63f1a81
·
1 Parent(s): 445ff6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -18,7 +18,7 @@ tags:
18
 
19
  In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].
20
  We advise installing it once you've installed the most recent version of PyTorch.
21
- This model is trained on NVIDIA GeForce RTX 2070:
22
  Python 3.7.15\
23
  NumPy 1.21.6\
24
  PyTorch 1.21.1\
@@ -53,7 +53,7 @@ python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manife
53
 
54
  ## Input and Output
55
 
56
- This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.
57
  Then, this model gives you the spoken words in a text format for a given audio sample.
58
 
59
  ## Model Architecture
@@ -72,8 +72,8 @@ Average WER: 15.53%
72
 
73
  ## Limitations
74
 
75
- Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.
76
- In general, this makes it faster for inference but might show less overall performance.
77
  In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
78
 
79
  ## References
 
18
 
19
  In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].
20
  We advise installing it once you've installed the most recent version of PyTorch.
21
+ This model is trained on NVIDIA GeForce RTX 2070:\
22
  Python 3.7.15\
23
  NumPy 1.21.6\
24
  PyTorch 1.21.1\
 
53
 
54
  ## Input and Output
55
 
56
+ This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
57
  Then, this model gives you the spoken words in a text format for a given audio sample.
58
 
59
  ## Model Architecture
 
72
 
73
  ## Limitations
74
 
75
+ Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
76
+ In general, this makes it faster for inference but might show less overall performance.\
77
  In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
78
 
79
  ## References