YazanSalameh commited on
Commit
5aadf6e
1 Parent(s): 0e87e59

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -4
README.md CHANGED
@@ -1,12 +1,77 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
2
  datasets:
3
  - mozilla-foundation/common_voice_16_0
4
  - BelalElhossany/mgb2_audios_transcriptions_non_overlap
5
  - nadsoft/Jordan-Audio
6
- language:
7
- - ar
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- This model has been trained on 3 datasets combined into one big dataset containing 46k arabic audio recordings. The test set is small (600 samples) to save time
11
- during training as colab free tier was used to train the model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ar
4
+ license: apache-2.0
5
+ base_model: openai/whisper-base
6
+ tags:
7
+ - ar-asr-leaderboard
8
+ - generated_from_trainer
9
+ - whisper
10
+ - Arabic
11
+ - AR
12
+ - speech to text
13
+ - stt
14
  datasets:
15
  - mozilla-foundation/common_voice_16_0
16
  - BelalElhossany/mgb2_audios_transcriptions_non_overlap
17
  - nadsoft/Jordan-Audio
18
+ metrics:
19
+ - wer
20
+ model-index:
21
+ - name: Whisper base arabic
22
+ results:
23
+ - task:
24
+ name: Automatic Speech Recognition
25
+ type: automatic-speech-recognition
26
+ # dataset:
27
+ # name: Common Voice 15.0
28
+ # type: mozilla-foundation/common_voice_15_0
29
+ # args: 'config: ar, split: test'
30
+ metrics:
31
+ - name: Wer
32
+ type: wer
33
+ value: 34.7
34
  ---
35
 
36
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
37
+ should probably proofread and complete it, then remove this comment. -->
38
+
39
+ # Whisper base arabic
40
+ It achieves the following results on the evaluation set:
41
+ - Loss: 0.44
42
+ - Wer: 34.7
43
+
44
+ ## Training and evaluation data
45
+ Train set:
46
+ - mozilla-foundation/common_voice_16_0 ar [train+validation]
47
+ - BelalElhossany/mgb2_audios_transcriptions_non_overlap
48
+ - nadsoft/Jordan-Audio
49
+
50
+ Test set:
51
+ 600 samples in total from the 3 sets to save time during training as colab free tier was used to train the model.
52
+ evaluate accuracy
53
+
54
+
55
+ ## Training procedure
56
+
57
+ ### Training hyperparameters
58
+
59
+ The following hyperparameters were used during training:
60
+ - learning_rate: 1e-05
61
+ - train_batch_size: 32
62
+ - eval_batch_size: 16
63
+ - seed: 42
64
+ - gradient_accumulation_steps: 1
65
+ - lr_scheduler_type: linear
66
+ - lr_scheduler_warmup_steps: 500
67
+
68
+
69
+ ### Training results
70
 
71
+ | Training Loss | Epoch | Step | Validation Loss | Wer |
72
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|
73
+ | 0.4603 | 1 | 1437 0.4931 | 45.8857 |
74
+ | 0.2867 | 2 | 2874 | 0.4493 | 36.9973 |
75
+ | 0.2494 | 3 | 4311 | 0.4219 | 43.5553 |
76
+ | 0.1435 | 4 | 5748 | 0.4408 | 40.2351 |
77
+ | 0.1345 | 5 | 7185 | 0.4407 | 34.7081 |