Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0491
  • Bleu: 27.38
  • Chrf: 51.97
  • Wer: 72.1747

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.02
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.6534 0.0138 100 2.2446 1.43 15.99 269.1130
2.4519 0.0276 200 2.1941 2.13 18.36 250.5178
2.2928 0.0414 300 2.0086 7.14 25.95 128.3656
2.233 0.0552 400 2.0239 5.61 24.25 134.0837
2.0406 0.0690 500 1.9215 5.64 25.65 183.8361
2.0273 0.0828 600 1.8556 13.41 30.96 83.7010
1.895 0.0966 700 1.8278 7.02 26.82 158.2170
1.9889 0.1103 800 1.7842 12.22 31.62 99.6398
1.8484 0.1241 900 1.7648 10.97 30.45 91.1751
1.7491 0.1379 1000 1.7498 10.0 29.42 109.0050
1.699 0.1517 1100 1.6662 12.53 34.87 109.9054
1.6959 0.1655 1200 1.6287 14.54 34.8 92.3008
1.6682 0.1793 1300 1.5800 13.26 33.5 103.0617
1.6625 0.1931 1400 1.6115 19.71 37.33 75.9118
1.5462 0.2069 1500 1.4993 18.3 39.49 93.7866
1.3834 0.2207 1600 1.4906 20.32 40.87 79.2436
1.39 0.2345 1700 1.4752 17.3 38.16 93.1562
1.5061 0.2483 1800 1.4004 20.11 39.69 81.0446
1.4125 0.2621 1900 1.3854 23.82 42.67 73.3904
1.3181 0.2759 2000 1.3979 20.57 40.87 78.8384
1.283 0.2897 2100 1.3446 17.97 40.47 88.8789
1.2061 0.3034 2200 1.3130 25.12 45.42 73.5254
1.2091 0.3172 2300 1.3274 22.12 43.56 79.8739
1.1264 0.3310 2400 1.2771 22.94 45.96 78.2080
1.0972 0.3448 2500 1.2858 24.38 46.04 75.4615
1.0822 0.3586 2600 1.2376 27.39 48.34 67.6722
1.0316 0.3724 2700 1.2461 28.0 47.61 68.5277
1.165 0.3862 2800 1.1869 26.05 48.13 71.6794
1.025 0.4 2900 1.1716 27.14 47.91 68.7528
0.8978 0.4138 3000 1.1628 28.34 49.15 65.6461
0.9146 0.4276 3100 1.1703 25.81 48.42 71.7244
0.9764 0.4414 3200 1.1526 29.63 51.22 67.3570
0.9455 0.4552 3300 1.1108 25.31 49.73 72.6249
0.9073 0.4690 3400 1.1085 27.7 50.85 72.7150
0.8596 0.4828 3500 1.0927 28.34 52.39 67.9424
0.8241 0.4966 3600 1.1026 29.95 51.37 65.2859
0.8436 0.5103 3700 1.0718 27.18 51.45 71.2292
0.8318 0.5241 3800 1.0678 30.71 53.35 64.3404
0.8262 0.5379 3900 1.0534 27.05 51.94 71.5894
0.8129 0.5517 4000 1.0491 27.38 51.97 72.1747

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
25
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-medium-ga2en-v6.2-r

Finetuned
(2103)
this model

Datasets used to train ymoslem/whisper-medium-ga2en-v6.2-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop
    self-reported
    27.380
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop
    self-reported
    72.175