Update README.md
Browse files
README.md
CHANGED
@@ -34,27 +34,33 @@ pip install nemo_toolkit['all']
|
|
34 |
The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.
|
35 |
|
36 |
#### How to Import
|
|
|
37 |
```
|
38 |
import nemo.collections.asr as nemo_asr
|
39 |
-
|
40 |
```
|
41 |
-
|
42 |
-
|
|
|
43 |
```
|
44 |
-
|
45 |
```
|
46 |
-
|
|
|
|
|
47 |
```
|
48 |
-
|
49 |
```
|
50 |
-
|
|
|
|
|
|
|
51 |
```
|
52 |
-
|
53 |
```
|
54 |
-
|
55 |
-
If you have a manifest file about your audio files:
|
56 |
```
|
57 |
-
|
58 |
```
|
59 |
|
60 |
## Input and Output
|
@@ -74,8 +80,9 @@ In total, KSC2 contains around 1.2k hours of high-quality transcribed data compr
|
|
74 |
|
75 |
## Performance
|
76 |
The model achieved:\
|
77 |
-
Average WER:
|
78 |
through the applying of **Greedy Decoding**.
|
|
|
79 |
## Limitations
|
80 |
|
81 |
Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
|
|
|
34 |
The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.
|
35 |
|
36 |
#### How to Import
|
37 |
+
|
38 |
```
|
39 |
import nemo.collections.asr as nemo_asr
|
40 |
+
model = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="stt_kz_quartznet15x5.nemo")
|
41 |
```
|
42 |
+
|
43 |
+
#### How to Train
|
44 |
+
|
45 |
```
|
46 |
+
python3 train.py --train_manifest path/to/manifest.json --val_manifest path/to/manifest.json --batch_size BATCH_SIZE --num_epochs NUM_EPOCHS --model_save_path path/to/save/model.nemo
|
47 |
```
|
48 |
+
|
49 |
+
#### How to Evaluate
|
50 |
+
|
51 |
```
|
52 |
+
python3 evaluate.py --model_path=/path/to/stt_kz_quartznet15x5.nemo --test_manifest path/to/manifest.json"
|
53 |
```
|
54 |
+
|
55 |
+
#### How to Transcribe Audio File
|
56 |
+
|
57 |
+
We can get a sample audio to test the model:
|
58 |
```
|
59 |
+
wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
|
60 |
```
|
61 |
+
Then this line of code is to transcribe the single audio:
|
|
|
62 |
```
|
63 |
+
python3 transcibe.py --model_path /path/to/stt_kz_quartznet15x5.nemo --audio_file_path path/to/audio/file
|
64 |
```
|
65 |
|
66 |
## Input and Output
|
|
|
80 |
|
81 |
## Performance
|
82 |
The model achieved:\
|
83 |
+
Average WER: 13.53%\
|
84 |
through the applying of **Greedy Decoding**.
|
85 |
+
|
86 |
## Limitations
|
87 |
|
88 |
Because the GPU has limited power, we used a lightweight model architecture for fine-tuning.\
|