whisper-medium-ko-normalized-1273h

This model is a fine-tuned version of openai/whisper-medium on a custom dataset for improving Korean speech recognition. It achieves the following results on the evaluation set:

  • Loss: 0.1254
  • Wer: 0.0551

Model description

The model was a fine-tuned version of openai/whisper-medium transcript the Korean audio sources into text. It was trained on GCP's a2-highgpu-1g (a100-40G) for 26 hours with about $90.

Intended uses & limitations

This model was trained to extend the performance of the original whisper model for Korean transcription task.

Training and evaluation data

I downloaded all data from AI-HUB (https://aihub.or.kr/). Two datasets, in particular, caught my attention: "Instruction Audio Set" and "Noisy Conversation Audio Set". Following indicates the hours information for each dastset.

dataset name train_split (hours) validation_split (hours)
Instruction Audio Set 910 105
Noisy Conversation Audio Set 363 76

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.0588 1.0 8775 0.1225 0.0604
0.0287 2.0 17550 0.1186 0.0567
0.0148 3.0 26325 0.1254 0.0551

Framework versions

  • Transformers 4.28.0.dev0
  • Pytorch 1.13.1+cu117
  • Datasets 2.11.0
  • Tokenizers 0.13.2

Evaluation Result for the dataset google/fleurs

The trained model is evaluated on the test split of subset ko_kr from the dataset google/fleurs. Please note that the model was not trained on the train split from the dataset.

model Wer
openai/whisper 0.2469
this model 0.2189
Downloads last month
24
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.