license: apache-2.0 | |
datasets: | |
- Murple/ksponspeech | |
language: | |
- ko | |
metrics: | |
- cer | |
- wer | |
pipeline_tag: automatic-speech-recognition | |
# Whisper-Medium-KsponSpeech | |
The Whisper-medium Model finetunned with [KsponSpeech](https://huggingface.co/datasets/Murple/ksponspeech) | |
### Model Description | |
<!-- Provide a longer summary of what this model is. --> | |
- **Developed by :** [yw0nam](https://github.com/yw0nam) | |
- **Shared by :** [yw0nam](https://github.com/yw0nam) | |
- **Model type :** ASR | |
- **License:** [apache-2.0] | |
## Uses | |
``` | |
processor = WhisperProcessor.from_pretrained("openai/whisper-medium", language="ko", task="transcribe") | |
model = WhisperForConditionalGeneration.from_pretrained('spow12/whisper-medium-zeroth_korean').cuda() | |
data, _ = librosa.load(wav_path, sr=16000) | |
input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda() | |
predicted_ids = model.generate(input_features) | |
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] | |
``` | |
### Metrics | |
Metric | result | | |
--- | --- | | |
WER | 3.96 | | |
CER | 1.71 | | |