luckyt commited on
Commit
e506013
1 Parent(s): fb7cbc4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -1,3 +1,84 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # Teochew Whisper Medium
6
+
7
+ This model is a fine-tuned version of the Whisper medium model to recognize the Teochew language (潮州话), a language in the Min Nan family spoken in southern China.
8
+
9
+ For a detailed documentation of how this model was trained, please refer to this video: https://www.youtube.com/watch?v=JH_78KmP4Zk
10
+
11
+ ## Training Data
12
+
13
+ The model was fine-tuned on approximately 35 hours of audio data derived from Teochew language movies, TV shows, and comedies.
14
+
15
+ ## Evaluation Metrics
16
+
17
+ On our private test set, we obtained the following Word Error Rate (WER) metrics:
18
+
19
+ - Careful Speech: 0.31
20
+ - Conversational Speech: 0.68
21
+
22
+ Known Limitations: this model has been trained on short audio clips and may struggle with audio that is longer than 10 seconds.
23
+
24
+ ## Example code
25
+
26
+ The following script downloads the model and starts a demo using Gradio to run the model:
27
+
28
+ ```
29
+ import torch
30
+ import torchaudio
31
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
32
+ import gradio as gr
33
+
34
+ DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
35
+ WHISPER_SAMPLE_RATE = 16000
36
+
37
+ processor = WhisperProcessor.from_pretrained("openai/whisper-medium")
38
+ model = WhisperForConditionalGeneration.from_pretrained(
39
+ "efficient-nlp/teochew-whisper-medium"
40
+ ).to(DEVICE)
41
+
42
+
43
+ def preprocess_audio(audio_path: str) -> torch.Tensor:
44
+ audio, sample_rate = torchaudio.load(audio_path)
45
+ # Resample if necessary
46
+ if sample_rate != WHISPER_SAMPLE_RATE:
47
+ resampler = torchaudio.transforms.Resample(
48
+ orig_freq=sample_rate, new_freq=WHISPER_SAMPLE_RATE
49
+ )
50
+ audio = resampler(audio)
51
+ # Convert to mono
52
+ if audio.shape[0] > 1:
53
+ audio = torch.mean(audio, dim=0)
54
+ return audio.squeeze()
55
+
56
+
57
+ def transcribe(audio_path: str) -> str:
58
+ audio_input = preprocess_audio(audio_path)
59
+ input_features = processor(
60
+ audio_input,
61
+ sampling_rate=WHISPER_SAMPLE_RATE,
62
+ return_tensors="pt",
63
+ language="Chinese",
64
+ ).input_features.to(DEVICE)
65
+
66
+ forced_decoder_ids = processor.get_decoder_prompt_ids(
67
+ language="Chinese", task="transcribe"
68
+ )
69
+ predicted_ids = model.generate(
70
+ input_features, forced_decoder_ids=forced_decoder_ids
71
+ )
72
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
73
+ return transcription
74
+
75
+
76
+ iface = gr.Interface(
77
+ fn=transcribe,
78
+ inputs=gr.Audio(type="filepath"),
79
+ outputs="text",
80
+ title="Teochew Speech Recognition",
81
+ )
82
+ iface.launch()
83
+ ```
84
+