Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1869
  • Bleu: 30.79
  • Chrf: 52.18
  • Wer: 66.5916

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.02
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.7614 0.0328 100 2.1538 3.55 19.32 146.1954
2.6486 0.0657 200 1.9662 7.4 25.58 132.4178
2.5302 0.0985 300 1.8689 6.8 24.76 172.0846
2.4506 0.1314 400 1.8244 13.88 32.7 95.9478
2.3631 0.1642 500 1.6759 14.35 32.54 95.6776
2.173 0.1970 600 1.7051 13.14 34.29 93.2463
2.3489 0.2299 700 1.6077 15.85 36.52 87.7082
2.0183 0.2627 800 1.5894 15.72 36.54 94.1018
2.1502 0.2956 900 1.5739 16.58 36.4 96.7132
2.016 0.3284 1000 1.5470 18.99 39.22 83.9712
1.747 0.3612 1100 1.5428 15.49 38.02 101.5759
1.6728 0.3941 1200 1.5129 19.45 39.24 89.4642
1.6476 0.4269 1300 1.4747 21.77 40.53 82.8906
1.6764 0.4598 1400 1.4672 16.58 39.94 95.0923
1.5683 0.4926 1500 1.4116 22.0 42.55 77.8028
1.3607 0.5255 1600 1.4290 24.15 43.36 74.1108
1.4888 0.5583 1700 1.3684 22.61 42.39 83.5660
1.4222 0.5911 1800 1.3791 25.68 46.64 70.1036
1.3456 0.6240 1900 1.3312 26.77 46.59 68.1225
1.1232 0.6568 2000 1.3433 27.24 44.86 72.2197
1.1674 0.6897 2100 1.3228 23.2 45.04 84.1513
1.0711 0.7225 2200 1.2771 25.23 46.41 76.5421
1.2015 0.7553 2300 1.2549 24.98 47.79 84.4214
1.1339 0.7882 2400 1.2758 27.01 48.04 72.2197
1.0196 0.8210 2500 1.2501 26.74 48.19 72.8050
0.9275 0.8539 2600 1.2430 32.42 50.41 62.2692
0.8328 0.8867 2700 1.2413 30.63 50.63 65.5110
0.7923 0.9195 2800 1.2441 26.18 48.19 74.0207
0.8887 0.9524 2900 1.2109 30.91 50.87 62.3593
0.7954 0.9852 3000 1.2233 31.63 49.93 64.5205
0.2886 1.0181 3100 1.2340 28.74 49.67 72.9401
0.2889 1.0509 3200 1.2369 31.74 49.23 63.6650
0.2812 1.0837 3300 1.2589 32.95 50.09 63.6200
0.2634 1.1166 3400 1.2428 30.14 49.93 69.9685
0.2248 1.1494 3500 1.2486 33.38 50.48 62.0441
0.2266 1.1823 3600 1.2089 29.8 50.46 67.2220
0.2148 1.2151 3700 1.1988 31.57 51.49 64.0252
0.2345 1.2479 3800 1.1889 32.46 52.24 64.1603
0.1924 1.2808 3900 1.1888 29.38 51.57 72.6700
0.2056 1.3136 4000 1.1869 30.79 52.18 66.5916

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
19
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v3.2-r

Finetuned
(2107)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v3.2-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    30.790
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    66.592