tdns03 commited on
Commit
7ed6ad7
1 Parent(s): 58084b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -4
README.md CHANGED
@@ -19,7 +19,69 @@ pipeline_tag: audio-classification
19
 
20
  # Whisper Pronunciation Scorer
21
 
22
- This model assesses pronunciation quality for Korean speech.
23
- The whisper-small model is fined-tuned using the Korea AI-Hub (https://www.aihub.or.kr/) foreigner Korean pronunciation evaluation dataset.
24
- You need to input the audio and ground truth script to obtain the Korean pronunciation score.
25
- Scale is 1~5.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  # Whisper Pronunciation Scorer
21
 
22
+ This model assesses pronunciation quality for Korean speech. It's based on the openai/whisper-small model, fine-tuned using the Korea AI-Hub (https://www.aihub.or.kr/) foreigner Korean pronunciation evaluation dataset.
23
+
24
+ # Model Description
25
+ The Whisper Pronunciation Scorer takes audio input along with its corresponding text transcript and provides a Korean pronunciation score on a scale of 1 to 5. It utilizes the encoder-decoder architecture of the Whisper model to extract speech features and employs an additional linear layer to predict the pronunciation score.
26
+
27
+ # How to Use
28
+ To use this model, follow these steps:
29
+
30
+ 1. Install required libraries
31
+ 2. Load the model and processor
32
+ 3. Prepare your audio file and text transcript
33
+ 4. Predict the pronunciation score
34
+
35
+ Here's a detailed example of how to use the model:
36
+
37
+ import torch
38
+ import torchaudio
39
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
40
+ import torch.nn as nn
41
+
42
+ class WhisperPronunciationScorer(nn.Module):
43
+ def __init__(self, pretrained_model):
44
+ super().__init__()
45
+ self.whisper = pretrained_model
46
+ self.score_head = nn.Linear(self.whisper.config.d_model, 1)
47
+
48
+ def forward(self, input_features, labels=None):
49
+ outputs = self.whisper(input_features, labels=labels, output_hidden_states=True)
50
+ last_hidden_state = outputs.decoder_hidden_states[-1]
51
+ scores = self.score_head(last_hidden_state.mean(dim=1)).squeeze()
52
+ return scores
53
+
54
+ def load_model(model_path, device):
55
+ model_name = "openai/whisper-small"
56
+ processor = WhisperProcessor.from_pretrained(model_name)
57
+ pretrained_model = WhisperForConditionalGeneration.from_pretrained(model_name)
58
+ model = WhisperPronunciationScorer(pretrained_model).to(device)
59
+ model.load_state_dict(torch.load(model_path, map_location=device))
60
+ model.eval()
61
+ return model, processor
62
+
63
+ def predict_pronunciation_score(model, processor, audio_path, transcript, device):
64
+ # Load and preprocess audio
65
+ audio, sr = torchaudio.load(audio_path)
66
+ if sr != 16000:
67
+ audio = torchaudio.functional.resample(audio, sr, 16000)
68
+ input_features = processor(audio.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features.to(device)
69
+
70
+ # Prepare transcript
71
+ labels = processor(text=transcript, return_tensors="pt").input_ids.to(device)
72
+
73
+ # Predict score
74
+ with torch.no_grad():
75
+ score = model(input_features, labels)
76
+ return score.item()
77
+
78
+ # Load model
79
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
80
+ model_path = "path/to/your/model.pth"
81
+ model, processor = load_model(model_path, device)
82
+
83
+ # Run prediction
84
+ audio_path = "path/to/your/audio.wav"
85
+ transcript = "안녕하세요"
86
+ score = predict_pronunciation_score(model, processor, audio_path, transcript, device)
87
+ print(f"Predicted pronunciation score: {score:.2f}")