whisper-swahili-small

Developed by: iLabAfrica
Model type: automatic-speech-recognition (ASR)
License: Mozilla.org
Finetuned from model: openai/whisper-small

This model is a fine-tuned version of openai/whisper-small on the Common Voice 11 dataset. It achieves the following results on the evaluation set:

Model description

The model is intended as exploration exercise to develop better ASR model for Swahili English.

The recommended audio usage for testing should be:

Training Data: 37,000 audio samples (average 6 seconds each) from the Swahili subset of the Common Voice 11 dataset.
Test Data: 11,000 audio samples from the same dataset

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Wer
0.2967	0.43	1000	0.4357	29.2499
0.2754	0.87	2000	0.3660	24.9526
0.1482	1.3	3000	0.3488	24.8468
0.1358	1.74	4000	0.3392	24.1789

The model struggles with noisy audio and Kenyan slang.

The model could be improved further by training on more diverse datasets and refining performance on noisy audio and slang.