whisper-swahili-small

This model is a fine-tuned version of openai/whisper-small on the Common Voice 11 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3392
  • Wer: 24.1789

Model description

Intended uses & limitations

The model is intended as exploration exercise to develop better ASR model for Swahili English.

The recommended audio usage for testing should be:

  1. Involves local Swahili, dialect, names, and terms etc.
  2. Involves Swahili accents i.e. Kenyan & Tanzanian accents.

Training and evaluation data

  • Training Data: 37,000 audio samples (average 6 seconds each) from the Swahili subset of the Common Voice 11 dataset.
  • Test Data: 11,000 audio samples from the same dataset

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 4000

Training results

Training Loss Epoch Step Validation Loss Wer
0.2967 0.43 1000 0.4357 29.2499
0.2754 0.87 2000 0.3660 24.9526
0.1482 1.3 3000 0.3488 24.8468
0.1358 1.74 4000 0.3392 24.1789

Limitations & Biases

The model struggles with noisy audio and Kenyan slang.

Framework versions

  • Transformers 4.33.0.dev0
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.3
  • Tokenizers 0.13.3

Future Work

The model could be improved further by training on more diverse datasets and refining performance on noisy audio and slang.

Downloads last month
21
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train PaschalK/whisper-swahili-small