YazanSalameh
/

Whisper-base-Arabic

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

YazanSalameh commited on Jan 18

Commit

5aadf6e

•

1 Parent(s): 0e87e59

Update README.md

Files changed (1) hide show

README.md +69 -4

README.md CHANGED Viewed

@@ -1,12 +1,77 @@
 ---
 datasets:
 - mozilla-foundation/common_voice_16_0
 - BelalElhossany/mgb2_audios_transcriptions_non_overlap
 - nadsoft/Jordan-Audio
-language:
-- ar
 ---
-This model has been trained on 3 datasets combined into one big dataset containing 46k arabic audio recordings. The test set is small (600 samples) to save time
-during training as colab free tier was used to train the model.

 ---
+language:
+- ar
+license: apache-2.0
+base_model: openai/whisper-base
+tags:
+- ar-asr-leaderboard
+- generated_from_trainer
+- whisper
+- Arabic
+- AR
+- speech to text
+- stt
 datasets:
 - mozilla-foundation/common_voice_16_0
 - BelalElhossany/mgb2_audios_transcriptions_non_overlap
 - nadsoft/Jordan-Audio
+metrics:
+- wer
+model-index:
+- name: Whisper base arabic
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    # dataset:
+    #   name: Common Voice 15.0
+    #   type: mozilla-foundation/common_voice_15_0
+    #   args: 'config: ar, split: test'
+    metrics:
+    - name: Wer
+      type: wer
+      value: 34.7
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Whisper base arabic
+It achieves the following results on the evaluation set:
+- Loss: 0.44
+- Wer: 34.7
+## Training and evaluation data
+Train set:
+- mozilla-foundation/common_voice_16_0 ar [train+validation]
+- BelalElhossany/mgb2_audios_transcriptions_non_overlap
+- nadsoft/Jordan-Audio
+Test set:
+600 samples in total from the 3 sets to save time during training as colab free tier was used to train the model.
+evaluate accuracy
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-05
+- train_batch_size: 32
+- eval_batch_size: 16
+- seed: 42
+- gradient_accumulation_steps: 1
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Wer     |
+|:-------------:|:-----:|:----:|:---------------:|:-------:|
+| 0.4603        | 1     | 1437   0.4931          | 45.8857 |
+| 0.2867        | 2     | 2874 | 0.4493          | 36.9973 |
+| 0.2494        | 3     | 4311 | 0.4219          | 43.5553 |
+| 0.1435        | 4     | 5748 | 0.4408          | 40.2351 |
+| 0.1345        | 5     | 7185 | 0.4407          | 34.7081 |