ymoslem
/

whisper-medium-ga2en-v2

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

whisper-medium-ga2en-v2 / README.md

ymoslem's picture

Update README.md

9f0b510 verified 8 months ago

|

history blame contribute delete

3.76 kB

	---
	language:
	- ga
	- en
	license: apache-2.0
	base_model: openai/whisper-medium
	tags:
	- generated_from_trainer
	datasets:
	- ymoslem/IWSLT2023-GA-EN
	- ymoslem/FLEURS-GA-EN
	- ymoslem/BitesizeIrish-GA-EN
	- ymoslem/SpokenWords-GA-EN-MTed
	metrics:
	- bleu
	- wer
	model-index:
	- name: Whisper Medium GA-EN Speech Translation
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
	type: ymoslem/IWSLT2023-GA-EN
	metrics:
	- name: Bleu
	type: bleu
	value: 32.14
	- name: Wer
	type: wer
	value: 65.96127870328681
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Whisper Medium GA-EN Speech Translation

	This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset.
	The best model checkpoint (this version) is at step 1400, epoch 1.84 (4 x 0.46), and it achieves the following results on the evaluation set:
	- Loss: 1.0240
	- Bleu: 33.55
	- Chrf: 50.95
	- Wer: 60.1981

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 0.03
	- training_steps: 2000
	- mixed_precision_training: Native AMP

	### Hardware

	4 x A40 48GB VRAM, with batch size 4 per machine (total: 16)

	### Training results

	\| Training Loss \| Epoch \| Step \| Bleu \| Chrf \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:-----:\|:-----:\|:---------------:\|:--------:\|
	\| 2.9468 \| 0.03 \| 100 \| 4.72 \| 20.55 \| 2.2829 \| 120.6213 \|
	\| 2.5074 \| 0.07 \| 200 \| 7.81 \| 25.23 \| 2.0136 \| 114.8131 \|
	\| 2.2406 \| 0.1 \| 300 \| 11.24 \| 29.39 \| 1.8224 \| 95.9928 \|
	\| 2.2466 \| 0.13 \| 400 \| 16.01 \| 34.73 \| 1.6530 \| 83.4309 \|
	\| 2.0276 \| 0.16 \| 500 \| 16.69 \| 34.76 \| 1.5344 \| 94.2368 \|
	\| 1.8429 \| 0.2 \| 600 \| 21.37 \| 37.48 \| 1.4923 \| 78.5682 \|
	\| 1.7621 \| 0.23 \| 700 \| 23.4 \| 40.89 \| 1.3666 \| 74.3359 \|
	\| 1.5629 \| 0.26 \| 800 \| 24.76 \| 44.63 \| 1.2876 \| 76.6321 \|
	\| 1.5458 \| 0.3 \| 900 \| 25.81 \| 44.59 \| 1.2178 \| 72.6249 \|
	\| 1.2971 \| 0.33 \| 1000 \| 27.63 \| 46.91 \| 1.1823 \| 70.2837 \|
	\| 1.3852 \| 0.36 \| 1100 \| 27.18 \| 46.16 \| 1.2303 \| 70.6889 \|
	\| 1.309 \| 0.39 \| 1200 \| 27.65 \| 47.41 \| 1.1573 \| 72.0396 \|
	\| 1.1818 \| 0.43 \| 1300 \| 31.17 \| 48.36 \| 1.1304 \| 61.6389 \|
	\| 1.2711 \| 0.46 \| 1400 \| 33.55 \| 50.95 \| 1.0839 \| 60.1981 \|
	\| 1.1305 \| 0.49 \| 1500 \| 30.37 \| 50.78 \| 1.0718 \| 68.6628 \|
	\| 1.0544 \| 0.53 \| 1600 \| 26.98 \| 48.1 \| 1.1109 \| 73.7506 \|
	\| 1.125 \| 0.56 \| 1700 \| 30.76 \| 50.19 \| 1.0709 \| 61.7740 \|
	\| 1.1348 \| 0.59 \| 1800 \| 33.71 \| 50.6 \| 1.0530 \| 59.9280 \|
	\| 1.14 \| 0.62 \| 1900 \| 31.45 \| 50.16 \| 1.0392 \| 66.9068 \|
	\| 1.1059 \| 0.66 \| 2000 \| 32.14 \| 50.84 \| 1.0240 \| 65.9613 \|


	### Framework versions

	- Transformers 4.39.3
	- Pytorch 2.0.1+cu118
	- Datasets 2.18.0
	- Tokenizers 0.15.2