johaness14 commited on
Commit
77dbfbf
·
verified ·
1 Parent(s): c0af1d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -2
README.md CHANGED
@@ -37,8 +37,6 @@ This model is intended for transcribing spoken Javanese language from audio reco
37
 
38
  The model use OpenSLR41 datasets, and split into 2 section (training and testing), then the model is trained using 1xA100 GPU with a training duration of 4-5 hours.
39
 
40
- ## Training procedure
41
-
42
  ### Training hyperparameters
43
 
44
  The following hyperparameters were used during training:
@@ -83,6 +81,90 @@ The following hyperparameters were used during training:
83
  | 0.0564 | 70.8215 | 50000 | 0.2711 | 0.1551 |
84
  | 0.0562 | 73.6544 | 52000 | 0.2727 | 0.1523 |
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
  ### Framework versions
88
 
 
37
 
38
  The model use OpenSLR41 datasets, and split into 2 section (training and testing), then the model is trained using 1xA100 GPU with a training duration of 4-5 hours.
39
 
 
 
40
  ### Training hyperparameters
41
 
42
  The following hyperparameters were used during training:
 
81
  | 0.0564 | 70.8215 | 50000 | 0.2711 | 0.1551 |
82
  | 0.0562 | 73.6544 | 52000 | 0.2727 | 0.1523 |
83
 
84
+ ### How to run (Gradio Web)
85
+ ```python
86
+ import torch
87
+ import torchaudio
88
+ import gradio as gr
89
+ import numpy as np
90
+ from transformers import pipeline, AutoProcessor, AutoModelForCTC
91
+
92
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
93
+
94
+ # Load the model and processor
95
+ MODEL_NAME = "<fill this to your model>"
96
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
97
+ model = AutoModelForCTC.from_pretrained(MODEL_NAME)
98
+
99
+ # Move model to GPU
100
+ model.to(device)
101
+
102
+ # Create the pipeline with the model and processor
103
+ transcriber = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=device)
104
+
105
+ def transcribe(audio):
106
+ sr, y = audio
107
+ y = y.astype(np.float32)
108
+ y /= np.max(np.abs(y))
109
+
110
+ return transcriber({"sampling_rate": sr, "raw": y})["text"]
111
+
112
+ demo = gr.Interface(
113
+ transcribe,
114
+ gr.Audio(sources=["upload"]),
115
+ "text",
116
+ )
117
+
118
+ demo.launch(share=True)
119
+ ```
120
+
121
+ ### How to run
122
+ ```python
123
+ import torch
124
+ import torchaudio
125
+ import gradio as gr
126
+ import numpy as np
127
+ from transformers import pipeline, AutoProcessor, AutoModelForCTC
128
+
129
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
130
+
131
+ # Load the model and processor
132
+ MODEL_NAME = "<fill this to actual model>"
133
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
134
+ model = AutoModelForCTC.from_pretrained(MODEL_NAME)
135
+
136
+ # Move model to GPU
137
+ model.to(device)
138
+
139
+ # Load audio file
140
+ AUDIO_PATH = "<replace 'path_to_audio_file.wav' with the actual path to your audio file>"
141
+ audio_input, sample_rate = torchaudio.load(AUDIO_PATH)
142
+
143
+ # Ensure the audio is mono (1 channel)
144
+ if audio_input.shape[0] > 1:
145
+ audio_input = torch.mean(audio_input, dim=0, keepdim=True)
146
+
147
+ # Resample audio if necessary
148
+ if sample_rate != 16000:
149
+ resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
150
+ audio_input = resampler(audio_input)
151
+
152
+ # Process the audio input
153
+ input_values = processor(audio_input.squeeze(), sampling_rate=16000, return_tensors="pt").input_values
154
+
155
+ # Move input values to GPU
156
+ input_values = input_values.to(device)
157
+
158
+ # Perform inference
159
+ with torch.no_grad():
160
+ logits = model(input_values).logits
161
+
162
+ # Decode the logits to text
163
+ predicted_ids = torch.argmax(logits, dim=-1)
164
+ transcription = processor.batch_decode(predicted_ids)[0]
165
+
166
+ print("Transcription:", transcription)
167
+ ```
168
 
169
  ### Framework versions
170