Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia + augmented dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3822
  • Bleu: 30.9
  • Chrf: 46.57
  • Wer: 62.2692

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.3533 0.0438 100 6.29 25.08 1.7789 148.7618
1.9035 0.0876 200 18.21 34.02 1.5122 85.6821
1.5357 0.1313 300 14.01 33.7 1.3983 93.3363
1.3056 0.1751 400 18.12 37.35 1.3447 95.0023
1.1177 0.2189 500 18.47 38.44 1.3168 95.3624
0.984 0.2627 600 26.82 41.23 1.3202 67.3120
0.8945 0.3065 700 26.73 42.53 1.2947 67.1319
0.7508 0.3503 800 25.67 42.06 1.2476 74.2008
0.7127 0.3940 900 22.59 41.05 1.2630 75.7767
0.5944 0.4378 1000 22.37 40.31 1.2726 82.4854
0.4972 0.4816 1100 22.88 42.52 1.2898 82.5304
0.4517 0.5254 1200 27.99 44.42 1.2509 64.1603
0.3885 0.5692 1300 29.58 44.8 1.2887 63.1247
0.3337 0.6130 1400 30.05 45.5 1.2645 62.6294
0.2852 0.6567 1500 28.2 43.57 1.2972 68.6628
0.2583 0.7005 1600 28.21 45.06 1.2716 73.6155
0.2016 0.7443 1700 27.55 43.21 1.3346 74.3809
0.1883 0.7881 1800 21.45 41.83 1.3124 94.1018
0.1514 0.8319 1900 28.2 44.09 1.3178 63.7551
0.1311 0.8757 2000 27.33 43.25 1.3246 74.3359
0.1128 0.9194 2100 25.21 42.93 1.3464 83.2508
0.0994 0.9632 2200 30.51 45.74 1.3315 64.7456
0.0512 1.0070 2300 30.89 46.44 1.3377 63.3498
0.0447 1.0508 2400 28.72 44.36 1.3587 64.3404
0.0368 1.0946 2500 31.53 46.56 1.3619 61.9541
0.0281 1.1384 2600 30.98 46.45 1.3596 70.4638
0.0273 1.1821 2700 32.09 46.85 1.3656 62.1792
0.0287 1.2259 2800 32.57 47.04 1.3547 62.0891
0.025 1.2697 2900 26.94 45.43 1.3539 81.1796
0.0263 1.3135 3000 30.11 46.73 1.3512 71.4993
0.0301 1.3573 3100 1.3510 31.14 46.93 62.0891
0.0263 1.4011 3200 1.3853 31.64 46.98 61.6389
0.027 1.4448 3300 1.4148 29.63 45.91 65.1058
0.0286 1.4886 3400 1.3828 30.12 46.2 62.7195
0.0218 1.5324 3500 1.3890 30.41 46.28 64.8807
0.0231 1.5762 3600 1.3898 31.05 46.72 63.3498
0.0193 1.6200 3700 1.3836 30.05 45.7 62.4944
0.0184 1.6637 3800 1.3732 30.95 47.17 61.8640
0.0168 1.7075 3900 1.3780 30.9 46.91 62.1342
0.0168 1.7513 4000 1.3822 30.9 46.57 62.2692

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
19
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-small-ga2en-v5.2.2

Finetuned
(2105)
this model

Datasets used to train ymoslem/whisper-small-ga2en-v5.2.2

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia + augmented
    self-reported
    30.900
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia + augmented
    self-reported
    62.269