Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0576
  • Bleu: 31.02
  • Chrf: 53.51
  • Wer: 68.5277

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.5374 0.0138 100 2.1201 2.56 18.92 222.4674
2.446 0.0276 200 2.1960 3.07 20.56 170.5088
2.2819 0.0414 300 1.9811 5.87 25.17 114.5880
2.1904 0.0552 400 1.9974 8.41 25.65 99.1896
2.026 0.0690 500 1.8961 7.99 27.64 130.7069
2.0448 0.0828 600 1.9410 9.15 27.78 104.9077
1.8606 0.0966 700 1.8451 9.57 29.34 110.4908
1.9887 0.1103 800 1.7419 13.44 32.32 84.3314
1.8633 0.1241 900 1.7376 13.43 31.58 102.1162
1.7576 0.1379 1000 1.6879 11.9 32.68 106.6186
1.7142 0.1517 1100 1.7571 12.4 33.66 102.6114
1.7168 0.1655 1200 1.6003 17.35 36.55 87.9784
1.6741 0.1793 1300 1.5883 15.41 35.46 92.8411
1.6534 0.1931 1400 1.5366 17.12 37.24 90.2296
1.58 0.2069 1500 1.5141 17.49 38.5 92.1207
1.403 0.2207 1600 1.4606 16.78 39.13 88.9689
1.3806 0.2345 1700 1.4263 19.26 40.02 86.7177
1.5111 0.2483 1800 1.4060 18.4 39.47 92.2557
1.4261 0.2621 1900 1.3911 21.19 42.13 78.7033
1.2974 0.2759 2000 1.3871 15.6 38.66 100.3152
1.2694 0.2897 2100 1.3527 16.21 39.99 91.2652
1.204 0.3034 2200 1.3232 20.2 41.18 86.8978
1.1922 0.3172 2300 1.3338 16.44 40.85 103.1968
1.1237 0.3310 2400 1.2830 19.29 43.73 94.4620
1.0989 0.3448 2500 1.2844 25.11 46.84 75.0563
1.0766 0.3586 2600 1.2578 23.87 46.1 74.5160
1.0432 0.3724 2700 1.2414 22.31 44.91 86.9878
1.1588 0.3862 2800 1.2051 23.32 45.94 77.1724
1.0062 0.4 2900 1.2059 26.15 48.27 69.4282
0.9178 0.4138 3000 1.1756 29.13 48.92 64.7456
0.9108 0.4276 3100 1.1665 28.34 48.9 67.2220
0.9868 0.4414 3200 1.1489 25.64 48.93 75.3264
0.9563 0.4552 3300 1.1181 27.58 49.67 71.8145
0.9138 0.4690 3400 1.1247 28.37 50.96 71.4543
0.8508 0.4828 3500 1.1007 29.75 51.41 68.3476
0.836 0.4966 3600 1.1114 30.99 52.2 66.5916
0.8435 0.5103 3700 1.0782 30.64 52.77 68.2125
0.8323 0.5241 3800 1.0744 29.78 52.94 68.9779
0.818 0.5379 3900 1.0639 31.23 53.21 67.7623
0.8095 0.5517 4000 1.0576 31.02 53.51 68.5277

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
33
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v6.3.0-r

Finetuned
(2103)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v6.3.0-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop
    self-reported
    31.020
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop
    self-reported
    68.528