mohammed
/

whisper-large-arabic-cv-11

@@ -45,6 +45,58 @@ It achieves the following results on the evaluation set:
 This model is a fine-tuned version of openai/whisper-large on the Common Voice 11.0 dataset. It achieves 12.61 WER.
 Data augmentation can be implemented to further improve the model performance.
 ## Training and evaluation data
 This model is trained on the Common Voice 11.0 dataset.

 This model is a fine-tuned version of openai/whisper-large on the Common Voice 11.0 dataset. It achieves 12.61 WER.
 Data augmentation can be implemented to further improve the model performance.
+## Intended uses & limitations
+```python
+from datasets import load_dataset
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+from datasets import Audio
+# load the dataset
+test_dataset = load_dataset("mozilla-foundation/common_voice_11_0", "ar", split="test", use_auth_token=True, trust_remote_code=True)
+# get the processor and model from mohammed/whisper-small-arabic-cv-11
+processor = WhisperProcessor.from_pretrained("mohammed/whisper-large-arabic-cv-11")
+model = WhisperForConditionalGeneration.from_pretrained("mohammed/whisper-large-arabic-cv-11")
+model.config.forced_decoder_ids = None
+# resample the audio files to 16000
+test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16000))
+# get 10 exmaples of model transcription
+for i in range(10):
+  sample = test_dataset[i]["audio"]
+  input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features
+  predicted_ids = model.generate(input_features)
+  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
+  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
+  print(f"{i} Reference Sentence: {test_dataset[i]['sentence']}")
+  print(f"{i} Predicted Sentence: {transcription[0]}")
+```
+```
+0 Reference Sentence: زارني في أوائل الشهر بدري
+0 Predicted Sentence: زارني في أوائل الشهر بدري
+1 Reference Sentence: إبنك بطل.
+1 Predicted Sentence: ابنك بطل
+2 Reference Sentence: الواعظ الأمرد هذا الذي
+2 Predicted Sentence: أواعز الأمرج هذا الذي
+3 Reference Sentence: سمح له هذا بالتخصص في البرونز الصغير، الذي يتم إنتاجه بشكل رئيسي ومربح للتصدير.
+3 Predicted Sentence: سمح له هذا بالتخصص في البلونز الصغير الذي اعتمد منتاجه بشكل رئيسي وغربح للتصدير
+4 Reference Sentence: ألديك قلم ؟
+4 Predicted Sentence: ألديك قلم
+5 Reference Sentence: يا نديمي قسم بي الى الصهباء
+5 Predicted Sentence: يا نديمي قسم بي إلى الصحباء
+6 Reference Sentence: إنك تكبر المشكلة.
+6 Predicted Sentence: إنك تكبر المشكلة
+7 Reference Sentence: يرغب أن يلتقي بك.
+7 Predicted Sentence: يرغب أن يلتقي بك
+8 Reference Sentence: إنهم لا يعرفون لماذا حتى.
+8 Predicted Sentence: إنهم لا يعرفون لماذا حتى
+9 Reference Sentence: سيسعدني مساعدتك أي وقت تحب.
+9 Predicted Sentence: سيسعدني مساعدتك أي وقت تحب
+```
 ## Training and evaluation data
 This model is trained on the Common Voice 11.0 dataset.