Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1842
  • Bleu: 28.6
  • Chrf: 49.54
  • Wer: 68.5277

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.5585 0.0109 100 2.94 16.53 2.1737 176.2269
2.5748 0.0219 200 6.93 24.84 2.0289 101.8460
2.554 0.0328 300 7.16 25.31 1.8861 142.4584
2.276 0.0438 400 8.9 27.72 1.8568 119.7208
2.3077 0.0547 500 14.51 32.58 1.7492 88.3836
2.0852 0.0657 600 16.71 34.1 1.6548 83.6560
2.1602 0.0766 700 14.93 35.2 1.6063 106.4385
1.9556 0.0876 800 20.64 36.74 1.6190 77.5777
1.7516 0.0985 900 15.44 36.67 1.5614 95.0023
1.7502 0.1095 1000 20.65 38.42 1.5317 81.4948
1.6851 0.1204 1100 19.13 37.91 1.5289 87.7533
1.5154 0.1314 1200 19.79 41.21 1.4906 83.3408
1.3658 0.1423 1300 19.58 39.16 1.4623 96.3980
1.3828 0.1533 1400 22.84 42.83 1.4069 77.5777
1.5339 0.1642 1500 20.91 41.62 1.3909 86.5376
1.2441 0.1752 1600 23.35 43.43 1.3726 75.5966
1.1607 0.1861 1700 20.4 42.41 1.3471 85.4120
1.1043 0.1970 1800 21.13 43.4 1.3332 81.5849
1.0698 0.2080 1900 23.84 44.54 1.3413 73.3904
1.0698 0.2189 2000 28.34 47.2 1.2848 66.9068
1.053 0.2299 2100 25.19 46.75 1.2951 73.1652
0.9139 0.2408 2200 28.43 47.11 1.2852 70.7789
0.742 0.2518 2300 30.5 47.62 1.2580 63.6200
0.8627 0.2627 2400 29.97 48.38 1.2308 66.2314
0.7213 0.2737 2500 22.96 46.55 1.2176 83.7010
0.672 0.2846 2600 27.35 48.02 1.2272 71.7695
0.784 0.2956 2700 31.16 50.83 1.2010 65.3760
0.6463 0.3065 2800 30.67 51.24 1.1884 64.9257
0.6028 0.3175 2900 32.07 51.3 1.1866 61.4588
0.6494 0.3284 3000 32.04 50.96 1.1768 63.3048
0.657 0.3394 3100 1.2126 30.55 50.18 66.0964
0.6239 0.3503 3200 1.1836 33.69 52.06 60.2431
0.63 0.3612 3300 1.2201 32.14 51.62 61.7290
0.5155 0.3722 3400 1.1956 32.62 51.99 61.3688
0.5392 0.3831 3500 1.2010 31.13 51.37 63.9802
0.5159 0.3941 3600 1.1831 32.2 51.81 62.4043
0.4535 0.4050 3700 1.1744 31.61 51.77 63.3949
0.3346 0.4160 3800 1.2066 30.67 50.21 65.4660
0.3991 0.4269 3900 1.1870 30.7 50.88 65.2409
0.395 0.4379 4000 1.1842 28.6 49.54 68.5277

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
49
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v5.3-r

Finetuned
(498)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v5.3-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    28.600
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    68.528