metadata
library_name: transformers
language:
- ug
license: apache-2.0
base_model: openai/whisper-small
tags:
- generated_from_trainer
metrics:
- cer
- wer
model-index:
- name: Whisper Small Fine-tuned with Uyghur Common Voice
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 15
type: mozilla-foundation/common_voice_15_0
metrics:
- name: Wer
type: wer
value: 28.29947071879802
- name: Cer
type: cer
value: 10.896777936451267
Whisper Small Fine-tuned with Uyghur Common Voice
This model is a fine-tuned version of openai/whisper-small on the Uyghur Common Voice dataset.
This model achieves the following results on the evaluation set:
- Loss: 1.5920
- Wer Ortho: 42.9701
- Wer: 28.2995
- Cer: 10.8968
Training and evaluation data
The training was done using the combined train and dev set of common_voice_15_0 (11215 recordings, ~20hrs of audio).
The testing was done using the test set of THUYG20 as the standard benchmark for Uyghur speech models.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- training_steps: 4000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer Ortho | Wer | Cer |
---|---|---|---|---|---|---|
0.574400 | 0.7133 | 500 | 1.413890 | 59.765522 | 48.561550 | 17.639905 |
0.299600 | 1.4256 | 1000 | 1.283326 | 52.819004 | 41.377838 | 14.717958 |
0.130600 | 2.1398 | 1500 | 1.379338 | 52.265742 | 38.953389 | 16.260934 |
0.122500 | 2.8531 | 2000 | 1.313730 | 50.245894 | 36.494793 | 14.762585 |
0.060500 | 3.5663 | 2500 | 1.434626 | 47.589356 | 32.998976 | 12.185938 |
0.019500 | 4.2796 | 3000 | 1.526625 | 45.345570 | 30.975756 | 11.307346 |
0.015300 | 4.9929 | 3500 | 1.531676 | 44.120488 | 29.285470 | 11.690021 |
0.003300 | 5.7061 | 4000 | 1.592020 | 42.970054 | 28.299471 | 10.896778 |
Framework versions
- Transformers 4.46.2
- Pytorch 2.5.1+cu121
- Datasets 3.1.0
- Tokenizers 0.20.3