johaness14 commited on
Commit
9184ae9
·
verified ·
1 Parent(s): f57c6d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -2
README.md CHANGED
@@ -37,8 +37,6 @@ This model is intended for transcribing spoken Javanese language from audio reco
37
 
38
  The model use OpenSLR41 datasets, and split into 2 section (training and testing), then the model is trained using 1xA100 GPU with a training duration of 4-5 hours.
39
 
40
- ## Training procedure
41
-
42
  ### Training hyperparameters
43
 
44
  The following hyperparameters were used during training:
@@ -79,6 +77,90 @@ The following hyperparameters were used during training:
79
  | 0.0328 | 59.4901 | 42000 | 0.2887 | 0.1654 |
80
  | 0.0324 | 62.3229 | 44000 | 0.2843 | 0.1502 |
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  ### Framework versions
84
 
 
37
 
38
  The model use OpenSLR41 datasets, and split into 2 section (training and testing), then the model is trained using 1xA100 GPU with a training duration of 4-5 hours.
39
 
 
 
40
  ### Training hyperparameters
41
 
42
  The following hyperparameters were used during training:
 
77
  | 0.0328 | 59.4901 | 42000 | 0.2887 | 0.1654 |
78
  | 0.0324 | 62.3229 | 44000 | 0.2843 | 0.1502 |
79
 
80
+ ### How to run (Gradio Web)
81
+ ```python
82
+ import torch
83
+ import torchaudio
84
+ import gradio as gr
85
+ import numpy as np
86
+ from transformers import pipeline, AutoProcessor, AutoModelForCTC
87
+
88
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
89
+
90
+ # Load the model and processor
91
+ MODEL_NAME = "<fill this to your model>"
92
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
93
+ model = AutoModelForCTC.from_pretrained(MODEL_NAME)
94
+
95
+ # Move model to GPU
96
+ model.to(device)
97
+
98
+ # Create the pipeline with the model and processor
99
+ transcriber = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=device)
100
+
101
+ def transcribe(audio):
102
+ sr, y = audio
103
+ y = y.astype(np.float32)
104
+ y /= np.max(np.abs(y))
105
+
106
+ return transcriber({"sampling_rate": sr, "raw": y})["text"]
107
+
108
+ demo = gr.Interface(
109
+ transcribe,
110
+ gr.Audio(sources=["upload"]),
111
+ "text",
112
+ )
113
+
114
+ demo.launch(share=True)
115
+ ```
116
+
117
+ ### How to run
118
+ ```python
119
+ import torch
120
+ import torchaudio
121
+ import gradio as gr
122
+ import numpy as np
123
+ from transformers import pipeline, AutoProcessor, AutoModelForCTC
124
+
125
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
126
+
127
+ # Load the model and processor
128
+ MODEL_NAME = "<fill this to actual model>"
129
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
130
+ model = AutoModelForCTC.from_pretrained(MODEL_NAME)
131
+
132
+ # Move model to GPU
133
+ model.to(device)
134
+
135
+ # Load audio file
136
+ AUDIO_PATH = "<replace 'path_to_audio_file.wav' with the actual path to your audio file>"
137
+ audio_input, sample_rate = torchaudio.load(AUDIO_PATH)
138
+
139
+ # Ensure the audio is mono (1 channel)
140
+ if audio_input.shape[0] > 1:
141
+ audio_input = torch.mean(audio_input, dim=0, keepdim=True)
142
+
143
+ # Resample audio if necessary
144
+ if sample_rate != 16000:
145
+ resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
146
+ audio_input = resampler(audio_input)
147
+
148
+ # Process the audio input
149
+ input_values = processor(audio_input.squeeze(), sampling_rate=16000, return_tensors="pt").input_values
150
+
151
+ # Move input values to GPU
152
+ input_values = input_values.to(device)
153
+
154
+ # Perform inference
155
+ with torch.no_grad():
156
+ logits = model(input_values).logits
157
+
158
+ # Decode the logits to text
159
+ predicted_ids = torch.argmax(logits, dim=-1)
160
+ transcription = processor.batch_decode(predicted_ids)[0]
161
+
162
+ print("Transcription:", transcription)
163
+ ```
164
 
165
  ### Framework versions
166