tdns03
/

whisper-small-korean-pronunciation-scorer-sampledata

Audio Classification

automatic-speech-recognition

speech-recognition

pronunciation-assessment

Inference Endpoints

Model card Files Files and versions Community

whisper-small-korean-pronunciation-scorer-sampledata / README.md

tdns03's picture

Update README.md

37e1cc9 verified 3 months ago

|

history blame contribute delete

3.28 kB

	---
	language: ko
	tags:
	- audio
	- speech-recognition
	- pronunciation-assessment
	license: apache-2.0
	datasets:
	- AI_Hub
	metrics:
	- 1~5
	widget:
	- text: 안녕하세요. 오늘 날씨가 좋습니다.
	example_title: Sample Korean Sentence
	- text: 영어는 세계 공용어입니다.
	example_title: Another Sample Sentence
	pipeline_tag: audio-classification
	---

	# Whisper Fine-tuned Pronunciation Scorer

	This model assesses pronunciation quality for Korean speech. It's based on the openai/whisper-small model, fine-tuned using the Korea AI-Hub (https://www.aihub.or.kr/) foreigner Korean pronunciation evaluation dataset.

	# Model Description
	The Pronunciation Scorer takes audio input along with its corresponding text transcript and provides a Korean pronunciation score on a scale of 1 to 5. It utilizes the encoder-decoder architecture of the Whisper model to extract speech features and employs an additional linear layer to predict the pronunciation score.

	# How to Use
	To use this model, follow these steps:

	1. Install required libraries
	2. Load the model and processor
	3. Prepare your audio file and text transcript
	4. Predict the pronunciation score

	Here's a detailed example of how to use the model:

	```
	import torch
	import torchaudio
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	import torch.nn as nn

	class WhisperPronunciationScorer(nn.Module):
	def __init__(self, pretrained_model):
	super().__init__()
	self.whisper = pretrained_model
	self.score_head = nn.Linear(self.whisper.config.d_model, 1)

	def forward(self, input_features, labels=None):
	outputs = self.whisper(input_features, labels=labels, output_hidden_states=True)
	last_hidden_state = outputs.decoder_hidden_states[-1]
	scores = self.score_head(last_hidden_state.mean(dim=1)).squeeze()
	return scores

	def load_model(model_path, device):
	model_name = "openai/whisper-small"
	processor = WhisperProcessor.from_pretrained(model_name)
	pretrained_model = WhisperForConditionalGeneration.from_pretrained(model_name)
	model = WhisperPronunciationScorer(pretrained_model).to(device)
	model.load_state_dict(torch.load(model_path, map_location=device))
	model.eval()
	return model, processor

	def predict_pronunciation_score(model, processor, audio_path, transcript, device):
	# Load and preprocess audio
	audio, sr = torchaudio.load(audio_path)
	if sr != 16000:
	audio = torchaudio.functional.resample(audio, sr, 16000)
	input_features = processor(audio.squeeze().numpy(), sampling_rate=16000, return_tensors="pt").input_features.to(device)

	# Prepare transcript
	labels = processor(text=transcript, return_tensors="pt").input_ids.to(device)

	# Predict score
	with torch.no_grad():
	score = model(input_features, labels)
	return score.item()

	# Load model
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model_path = "path/to/your/model.pth"
	model, processor = load_model(model_path, device)

	# Run prediction
	audio_path = "path/to/your/audio.wav"
	transcript = "안녕하세요"
	score = predict_pronunciation_score(model, processor, audio_path, transcript, device)
	print(f"Predicted pronunciation score: {score:.2f}")
	```