Whisper Small GA-EN Speech Translation, 1 epoch, 10k steps

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3134
  • Bleu: 36.12
  • Chrf: 53.74
  • Wer: 58.3071

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.02
  • training_steps: 10000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.6291 0.0109 100 2.33 16.34 2.1971 175.5516
2.6591 0.0219 200 5.57 22.49 2.0357 122.2873
2.5637 0.0328 300 7.67 26.29 1.8690 133.0032
2.2954 0.0438 400 11.2 30.03 1.8062 114.2278
2.3292 0.0547 500 9.85 29.28 1.7421 117.2895
2.1223 0.0657 600 14.56 32.56 1.6739 84.2864
2.2398 0.0766 700 13.86 34.74 1.7187 98.9644
2.002 0.0876 800 15.53 36.64 1.6392 96.7582
1.8611 0.0985 900 15.8 36.32 1.6283 94.3719
1.8498 0.1095 1000 17.58 36.0 1.6102 85.5921
1.7585 0.1204 1100 15.91 36.61 1.6337 100.2251
1.6115 0.1314 1200 22.21 39.94 1.5381 76.8122
1.4415 0.1423 1300 20.36 37.87 1.5864 79.1986
1.5103 0.1533 1400 23.2 41.26 1.4925 75.2364
1.6576 0.1642 1500 18.12 40.49 1.4508 102.9266
1.3429 0.1752 1600 27.88 43.74 1.4399 69.7884
1.2522 0.1861 1700 23.04 43.31 1.4256 77.1724
1.2018 0.1970 1800 21.06 40.39 1.4072 78.6583
1.1945 0.2080 1900 23.0 42.71 1.4222 76.7222
1.1869 0.2189 2000 22.54 42.02 1.3992 75.8667
1.1752 0.2299 2100 20.81 41.07 1.3926 79.5137
1.0281 0.2408 2200 27.24 45.55 1.3633 69.6083
0.894 0.2518 2300 28.6 45.58 1.3287 65.8712
0.9788 0.2627 2400 27.75 46.21 1.3138 69.2931
0.8418 0.2737 2500 27.85 46.17 1.3064 68.3026
0.7559 0.2846 2600 28.44 48.52 1.2903 68.3476
0.8632 0.2956 2700 27.87 46.86 1.2834 68.3476
0.7501 0.3065 2800 28.63 49.25 1.2669 68.5277
0.6953 0.3175 2900 30.46 48.83 1.2615 64.4304
0.7195 0.3284 3000 27.49 47.94 1.2514 71.0941
0.6155 0.3394 3100 30.06 49.64 1.2428 66.5916
0.605 0.3503 3200 31.64 50.27 1.2040 63.8451
0.6349 0.3612 3300 28.96 49.35 1.2077 65.3760
0.4669 0.3722 3400 31.17 48.95 1.2219 64.2503
0.5196 0.3831 3500 30.97 50.13 1.2124 63.8001
0.5141 0.3941 3600 31.97 50.8 1.2026 63.0347
0.4221 0.4050 3700 31.76 51.35 1.1893 63.4399
0.2951 0.4160 3800 32.4 51.08 1.2049 63.1247
0.3898 0.4269 3900 32.15 51.09 1.1906 63.5299
0.4071 0.4379 4000 33.1 51.85 1.1873 62.4043
0.3975 0.4488 4100 29.58 49.33 1.2117 70.3287
0.4206 0.4598 4200 31.69 50.8 1.2150 65.0158
0.2935 0.4707 4300 32.9 50.01 1.2484 62.8546
0.3718 0.4817 4400 31.64 50.55 1.2055 63.8451
0.3722 0.4926 4500 28.16 49.28 1.2200 70.4638
0.2986 0.5036 4600 28.76 49.9 1.2240 68.7528
0.3327 0.5145 4700 29.34 49.67 1.2052 67.5822
0.2489 0.5255 4800 32.52 51.77 1.2083 62.4493
0.3653 0.5364 4900 31.48 51.16 1.2166 63.8451
0.3326 0.5473 5000 33.04 51.71 1.2169 62.4493
0.3045 0.5583 5100 27.45 48.22 1.2460 68.9779
0.3444 0.5692 5200 33.14 50.76 1.2829 62.2692
0.3236 0.5802 5300 28.89 49.37 1.2499 70.3737
0.3004 0.5911 5400 29.89 49.29 1.3165 68.7078
0.3019 0.6021 5500 32.8 49.78 1.2782 62.8095
0.2923 0.6130 5600 31.75 50.26 1.2468 63.3498
0.3237 0.6240 5700 34.4 52.59 1.2511 61.0986
0.2226 0.6349 5800 30.51 50.38 1.2479 63.3498
0.2207 0.6459 5900 32.68 51.97 1.2641 62.1342
0.2017 0.6568 6000 32.47 51.36 1.2640 62.6745
0.201 0.6678 6100 33.6 52.29 1.2774 61.4588
0.203 0.6787 6200 30.27 50.84 1.2670 65.6461
0.1456 0.6897 6300 31.2 51.05 1.2656 63.3048
0.1607 0.7006 6400 30.39 51.04 1.2611 65.8262
0.1933 0.7115 6500 31.78 50.92 1.2545 63.0797
0.1537 0.7225 6600 30.18 50.18 1.2500 64.7006
0.1279 0.7334 6700 33.23 51.0 1.2548 59.8379
0.1189 0.7444 6800 33.51 50.67 1.2594 61.1887
0.1056 0.7553 6900 32.97 51.02 1.2578 61.9991
0.1105 0.7663 7000 32.74 50.83 1.2569 62.0441
0.1183 0.7772 7100 34.07 52.2 1.2590 60.4232
0.1373 0.7882 7200 33.55 50.6 1.2430 61.2787
0.1325 0.7991 7300 32.36 50.39 1.2548 62.3143
0.0907 0.8101 7400 32.28 50.99 1.2578 61.2787
0.0919 0.8210 7500 33.01 51.81 1.2791 60.4683
0.0852 0.8320 7600 32.97 51.56 1.2782 61.5489
0.1223 0.8429 7700 33.57 52.33 1.2638 59.9280
0.0826 0.8539 7800 33.83 52.7 1.2634 60.1531
0.0783 0.8648 7900 33.79 52.31 1.2595 60.1081
0.0986 0.8758 8000 34.33 52.54 1.2608 59.4327
0.1148 0.8867 8100 1.2736 34.03 52.52 59.8829
0.1134 0.8976 8200 1.3073 34.14 51.64 61.5038
0.1166 0.9086 8300 1.3385 30.51 49.26 65.5561
0.0871 0.9195 8400 1.3313 32.31 51.06 62.5394
0.0927 0.9305 8500 1.3898 28.64 48.43 69.3832
0.1012 0.9414 8600 1.3144 33.12 52.02 61.4138
0.0742 0.9524 8700 1.3284 33.68 51.38 61.7740
0.0802 0.9633 8800 1.3300 34.33 51.38 61.4138
0.0799 0.9743 8900 1.3328 33.72 50.77 60.1981
0.0936 0.9852 9000 1.3181 34.76 51.4 60.0630
0.1091 0.9962 9100 1.3096 35.13 52.6 59.9730
0.0427 1.0071 9200 1.2905 35.49 53.12 59.8379
0.0338 1.0181 9300 1.3097 35.33 52.62 60.5133
0.0363 1.0290 9400 1.3172 35.51 53.06 59.6128
0.0319 1.0400 9500 1.3166 36.82 53.6 58.3971
0.0434 1.0509 9600 1.3050 35.62 53.28 59.6578
0.0218 1.0619 9700 1.3096 35.57 53.28 59.5227
0.0316 1.0728 9800 1.3162 36.14 53.87 58.3971
0.0315 1.0837 9900 1.3121 36.26 54.16 58.3521
0.0229 1.0947 10000 1.3134 36.12 53.74 58.3071

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
35
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v5.2.1-r

Finetuned
(498)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v5.2.1-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    36.120
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    58.307