Whisper Small Fine-tuned with Uyghur Common Voice
This model is a fine-tuned version of openai/whisper-small on the Uyghur Common Voice dataset.
As a proof-of-concept, only 3264 recordings (~5.5 hrs of audio) were used for training, and 937 recordings (~1.5 hrs of audio) were used for validation. You may find the full dataset for Uyghur and other languages here: https://commonvoice.mozilla.org/en/datasets.
This model achieves the following results on the evaluation set:
- Loss: 0.5105
- Wer Ortho: 41.6377
- Wer: 34.9961
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- training_steps: 500
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer Ortho | Wer |
---|---|---|---|---|---|
0.0574 | 2.4510 | 500 | 0.5105 | 41.6377 | 34.9961 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.5.0+cu121
- Datasets 3.1.0
- Tokenizers 0.19.1
- Downloads last month
- 181
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ixxan/whisper-small-common-voice-ug
Base model
openai/whisper-small