whisper-swahili-small
This model is a fine-tuned version of openai/whisper-small on the Common Voice 11 dataset. It achieves the following results on the evaluation set:
- Loss: 0.3392
- Wer: 24.1789
Model description
- Developed by: iLabAfrica
- Model type: automatic-speech-recognition (ASR)
- License: Mozilla.org
- Finetuned from model: openai/whisper-small
Intended uses & limitations
The model is intended as exploration exercise to develop better ASR model for Swahili English.
The recommended audio usage for testing should be:
- Involves local Swahili, dialect, names, and terms etc.
- Involves Swahili accents i.e. Kenyan & Tanzanian accents.
Training and evaluation data
- Training Data: 37,000 audio samples (average 6 seconds each) from the Swahili subset of the Common Voice 11 dataset.
- Test Data: 11,000 audio samples from the same dataset
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 4000
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.2967 | 0.43 | 1000 | 0.4357 | 29.2499 |
0.2754 | 0.87 | 2000 | 0.3660 | 24.9526 |
0.1482 | 1.3 | 3000 | 0.3488 | 24.8468 |
0.1358 | 1.74 | 4000 | 0.3392 | 24.1789 |
Limitations & Biases
The model struggles with noisy audio and Kenyan slang.
Framework versions
- Transformers 4.33.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.14.3
- Tokenizers 0.13.3
Future Work
The model could be improved further by training on more diverse datasets and refining performance on noisy audio and slang.
- Downloads last month
- 21
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.