transiteration
commited on
Commit
·
63f1a81
1
Parent(s):
445ff6d
Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ tags:
|
|
18 |
|
19 |
In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].
|
20 |
We advise installing it once you've installed the most recent version of PyTorch.
|
21 |
-
This model is trained on NVIDIA GeForce RTX 2070
|
22 |
Python 3.7.15\
|
23 |
NumPy 1.21.6\
|
24 |
PyTorch 1.21.1\
|
@@ -53,7 +53,7 @@ python3 transcribe_speech.py model_path=stt_kz_quartznet15x5.nemo dataset_manife
|
|
53 |
|
54 |
## Input and Output
|
55 |
|
56 |
-
This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz
|
57 |
Then, this model gives you the spoken words in a text format for a given audio sample.
|
58 |
|
59 |
## Model Architecture
|
@@ -72,8 +72,8 @@ Average WER: 15.53%
|
|
72 |
|
73 |
## Limitations
|
74 |
|
75 |
-
Because the GPU has limited power, we used a lightweight model architecture for fine-tuning
|
76 |
-
In general, this makes it faster for inference but might show less overall performance
|
77 |
In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
|
78 |
|
79 |
## References
|
|
|
18 |
|
19 |
In order to prepare, adjust, or experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].
|
20 |
We advise installing it once you've installed the most recent version of PyTorch.
|
21 |
+
This model is trained on NVIDIA GeForce RTX 2070:\
|
22 |
Python 3.7.15\
|
23 |
NumPy 1.21.6\
|
24 |
PyTorch 1.21.1\
|
|
|
53 |
|
54 |
## Input and Output
|
55 |
|
56 |
+
This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
|
57 |
Then, this model gives you the spoken words in a text format for a given audio sample.
|
58 |
|
59 |
## Model Architecture
|
|
|
72 |
|
73 |
## Limitations
|
74 |
|
75 |
+
Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
|
76 |
+
In general, this makes it faster for inference but might show less overall performance.\
|
77 |
In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
|
78 |
|
79 |
## References
|