kaushal98b commited on
Commit
b669991
·
1 Parent(s): 4b7fd02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -89
README.md CHANGED
@@ -1,91 +1,63 @@
1
  ---
2
- {}
 
 
 
 
3
  ---
4
-
5
- ## IndicConformer
6
-
7
- IndicConformer is an Hybrid RNNT conformer model built for Urdu.
8
-
9
- ## AI4Bharat NeMo:
10
-
11
- To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below
12
- ```
13
- git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh
14
- ```
15
-
16
- ## Usage
17
-
18
- ```bash
19
- $ python inference.py --help
20
- usage: inference.py [-h] -c CHECKPOINT -f AUDIO_FILEPATH -d (cpu,cuda) -l LANGUAGE_CODE
21
-
22
- options:
23
- -h, --help show this help message and exit
24
- -c CHECKPOINT, --checkpoint CHECKPOINT
25
- Path to .nemo file
26
- -f AUDIO_FILEPATH, --audio_filepath AUDIO_FILEPATH
27
- Audio filepath
28
- -d (cpu,cuda), --device (cpu,cuda)
29
- Device (cpu/gpu)
30
- -l LANGUAGE_CODE, --language_code LANGUAGE_CODE
31
- Language Code (eg. hi)
32
- ```
33
-
34
- ## Example command
35
- ```
36
- python inference.py -c ai4b_indicConformer_hi.nemo -f hindi-16khz.wav -d cuda -l hi
37
- ```
38
- Expected output -
39
-
40
- ```
41
- Loading model..
42
- ...
43
- Transcibing..
44
- ----------
45
- Transcript:
46
- Took ** seconds.
47
- ----------
48
- ```
49
-
50
- ### Input
51
-
52
- This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
53
-
54
- ### Output
55
-
56
- This model provides transcribed speech as a string for a given audio sample.
57
-
58
- ## Model Architecture
59
-
60
- This model is a onformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
61
- 512 as the model dimension.
62
-
63
- ## Training
64
-
65
- <ADD INFORMATION ABOUT HOW THE MODEL WAS TRAINED - HOW MANY EPOCHS, AMOUNT OF COMPUTE ETC>
66
-
67
- ### Datasets
68
-
69
- <LIST THE NAME AND SPLITS OF DATASETS USED TO TRAIN THIS MODEL (ALONG WITH LANGUAGE AND ANY ADDITIONAL INFORMATION)>
70
-
71
- ## Performance
72
-
73
- <LIST THE SCORES OF THE MODEL -
74
- OR
75
- USE THE Hugging Face Evaluate LiBRARY TO UPLOAD METRICS>
76
-
77
- ## Limitations
78
-
79
- <DECLARE ANY POTENTIAL LIMITATIONS OF THE MODEL>
80
-
81
- Eg:
82
- Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
83
-
84
-
85
- ## References
86
-
87
- <ADD ANY REFERENCES HERE AS NEEDED>
88
-
89
- [1] [AI4Bharat NeMo Toolkit](https://github.com/AI4Bharat/NeMo)
90
-
91
-
 
1
  ---
2
+ license: mit
3
+ language:
4
+ - ur
5
+ pipeline_tag: automatic-speech-recognition
6
+ library_name: nemo
7
  ---
8
+ ## IndicConformer
9
+
10
+ IndicConformer is a Hybrid RNNT conformer model built for Urdu.
11
+
12
+ ## AI4Bharat NeMo:
13
+
14
+ To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below
15
+ ```
16
+ git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ ```bash
22
+ $ python inference.py --help
23
+ usage: inference.py [-h] -c CHECKPOINT -f AUDIO_FILEPATH -d (cpu,cuda) -l LANGUAGE_CODE
24
+ options:
25
+ -h, --help show this help message and exit
26
+ -c CHECKPOINT, --checkpoint CHECKPOINT
27
+ Path to .nemo file
28
+ -f AUDIO_FILEPATH, --audio_filepath AUDIO_FILEPATH
29
+ Audio filepath
30
+ -d (cpu,cuda), --device (cpu,cuda)
31
+ Device (cpu/gpu)
32
+ -l LANGUAGE_CODE, --language_code LANGUAGE_CODE
33
+ Language Code (eg. hi)
34
+ ```
35
+
36
+ ## Example command
37
+ ```
38
+ python inference.py -c indicconformer_stt_ur_hybrid_rnnt_large.nemo -f hindi-16khz.wav -d cuda -l hi
39
+ ```
40
+ Expected output -
41
+
42
+ ```
43
+ Loading model..
44
+ ...
45
+ Transcibing..
46
+ ----------
47
+ Transcript:
48
+ Took ** seconds.
49
+ ----------
50
+ ```
51
+
52
+ ### Input
53
+
54
+ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
55
+
56
+ ### Output
57
+
58
+ This model provides transcribed speech as a string for a given audio sample.
59
+
60
+ ## Model Architecture
61
+
62
+ This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
63
+ 512 as the model dimension.