Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, Wikimedia, and EUbookshop dataset. It achieves the following results on the evaluation set:

Loss: 1.0576
Bleu: 31.02
Chrf: 53.51
Wer: 68.5277

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Chrf	Wer
2.5374	0.0138	100	2.1201	2.56	18.92	222.4674
2.446	0.0276	200	2.1960	3.07	20.56	170.5088
2.2819	0.0414	300	1.9811	5.87	25.17	114.5880
2.1904	0.0552	400	1.9974	8.41	25.65	99.1896
2.026	0.0690	500	1.8961	7.99	27.64	130.7069
2.0448	0.0828	600	1.9410	9.15	27.78	104.9077
1.8606	0.0966	700	1.8451	9.57	29.34	110.4908
1.9887	0.1103	800	1.7419	13.44	32.32	84.3314
1.8633	0.1241	900	1.7376	13.43	31.58	102.1162
1.7576	0.1379	1000	1.6879	11.9	32.68	106.6186
1.7142	0.1517	1100	1.7571	12.4	33.66	102.6114
1.7168	0.1655	1200	1.6003	17.35	36.55	87.9784
1.6741	0.1793	1300	1.5883	15.41	35.46	92.8411
1.6534	0.1931	1400	1.5366	17.12	37.24	90.2296
1.58	0.2069	1500	1.5141	17.49	38.5	92.1207
1.403	0.2207	1600	1.4606	16.78	39.13	88.9689
1.3806	0.2345	1700	1.4263	19.26	40.02	86.7177
1.5111	0.2483	1800	1.4060	18.4	39.47	92.2557
1.4261	0.2621	1900	1.3911	21.19	42.13	78.7033
1.2974	0.2759	2000	1.3871	15.6	38.66	100.3152
1.2694	0.2897	2100	1.3527	16.21	39.99	91.2652
1.204	0.3034	2200	1.3232	20.2	41.18	86.8978
1.1922	0.3172	2300	1.3338	16.44	40.85	103.1968
1.1237	0.3310	2400	1.2830	19.29	43.73	94.4620
1.0989	0.3448	2500	1.2844	25.11	46.84	75.0563
1.0766	0.3586	2600	1.2578	23.87	46.1	74.5160
1.0432	0.3724	2700	1.2414	22.31	44.91	86.9878
1.1588	0.3862	2800	1.2051	23.32	45.94	77.1724
1.0062	0.4	2900	1.2059	26.15	48.27	69.4282
0.9178	0.4138	3000	1.1756	29.13	48.92	64.7456
0.9108	0.4276	3100	1.1665	28.34	48.9	67.2220
0.9868	0.4414	3200	1.1489	25.64	48.93	75.3264
0.9563	0.4552	3300	1.1181	27.58	49.67	71.8145
0.9138	0.4690	3400	1.1247	28.37	50.96	71.4543
0.8508	0.4828	3500	1.1007	29.75	51.41	68.3476
0.836	0.4966	3600	1.1114	30.99	52.2	66.5916
0.8435	0.5103	3700	1.0782	30.64	52.77	68.2125
0.8323	0.5241	3800	1.0744	29.78	52.94	68.9779
0.818	0.5379	3900	1.0639	31.23	53.21	67.7623
0.8095	0.5517	4000	1.0576	31.02	53.51	68.5277

Framework versions

Transformers 4.41.2
Pytorch 2.2.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

ymoslem
/

whisper-medium-ga2en-v6.3.0-r

Whisper Medium GA-EN Speech Translation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ymoslem/whisper-medium-ga2en-v6.3.0-r

Datasets used to train ymoslem/whisper-medium-ga2en-v6.3.0-r

Evaluation results