Whisper-base-Arabic / README.md
YazanSalameh's picture
Update README.md
0a61996 verified
|
raw
history blame
2.06 kB
metadata
language:
  - ar
license: apache-2.0
base_model: openai/whisper-base
tags:
  - whisper
  - Arabic
  - AR
  - speech to text
  - stt
  - transcription
datasets:
  - mozilla-foundation/common_voice_16_0
  - BelalElhossany/mgb2_audios_transcriptions_non_overlap
  - nadsoft/Jordan-Audio
metrics:
  - wer
model-index:
  - name: Whisper base arabic
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        metrics:
          - name: Wer
            type: wer
            value: 34.7

Whisper base arabic

It achieves the following results on the evaluation set:

  • Loss: 0.44
  • Wer: 34.7

Training and evaluation data

Train set:

  • mozilla-foundation/common_voice_16_0 ar [train+validation]
  • BelalElhossany/mgb2_audios_transcriptions_non_overlap
  • nadsoft/Jordan-Audio

cross validation set: 600 samples in total from the 3 sets to save time during training as colab free tier was used to train the model. note: evaluate accuracy in the way you see fit.

Training procedure

removed arabic (حركات) from the texts. trained the model on the combined dataset for 6 epochs, the best one being the fifth so the model is basically the 5th epoch.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 1
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500

Training results

Training Loss Epoch Step Validation Loss Wer
0.4603 1 1437 0.4931 45.8857
0.2867 2 2874 0.4493 36.9973
0.2494 3 4311 0.4219 43.5553
0.1435 4 5748 0.4408 40.2351
0.1345 5 7185 0.4407 34.7081