Marianoleiras
/

whisper-small-es-ja

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

whisper-small-es-ja / README.md

Marianoleiras's picture

Update README.md

0e4c207 verified about 1 month ago

|

history blame contribute delete

3.33 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	metrics:
	- bleu
	model-index:
	- name: whisper-small-es-ja
	results: []
	datasets:
	- Marianoleiras/voxpopuli_es-ja
	language:
	- es
	- ja
	base_model:
	- openai/whisper-small
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# whisper-small-es-ja

	## Model Overview
	This model was developed as part of a workshop organized by Yasmin Moslem, focusing on speech-to-text pipelines.
	The workshop's primary goal was to enable accurate transcription and translation of spoken source languages into written target languages while learning about end-to-end and cascaded approaches in the process.

	This model represents an end-to-end solution for Spanish-to-Japanese speech-to-text (STT) tasks and is a fine-tuned version of OpenAI's Whisper-small, specifically trained on the [Marianoleiras/voxpopuli_es-ja](https://huggingface.co/datasets/Marianoleiras/voxpopuli_es-ja) dataset for Spanish-to-Japanese speech-to-text (STT) tasks.

	The model achieves performance metrics on the provided dataset:

	Evaluation Set:
	- Loss: 1.1724
	- BLEU: 22.2850

	Test Set:
	- BLEU: 20.8607
	- ChrF++: 23.3571
	- Comet: 77.6979

	(Baseline evaluation on test set: BLEU 0.4793)

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- training_steps: 3500
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Bleu \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:-------:\|:---------------:\|
	\| 1.5787 \| 0.3962 \| 250 \| 11.6756 \| 1.5196 \|
	\| 1.3535 \| 0.7924 \| 500 \| 16.0514 \| 1.3470 \|
	\| 1.0658 \| 1.1886 \| 750 \| 17.7743 \| 1.2533 \|
	\| 1.0303 \| 1.5848 \| 1000 \| 19.1894 \| 1.2046 \|
	\| 0.9893 \| 1.9810 \| 1250 \| 20.1198 \| 1.1591 \|
	\| 0.7569 \| 2.3772 \| 1500 \| 21.0054 \| 1.1546 \|
	\| 0.7571 \| 2.7734 \| 1750 \| 21.6425 \| 1.1378 \|
	\| 0.5557 \| 3.1696 \| 2000 \| 21.7563 \| 1.1500 \|
	\| 0.5612 \| 3.5658 \| 2250 \| 21.1391 \| 1.1395 \|
	\| 0.5581 \| 3.9620 \| 2500 \| 22.0412 \| 1.1343 \|
	\| 0.4144 \| 4.3582 \| 2750 \| 22.2850 \| 1.1724 \|
	\| 0.4114 \| 4.7544 \| 3000 \| 22.1925 \| 1.1681 \|
	\| 0.3005 \| 5.1506 \| 3250 \| 21.4948 \| 1.1947 \|
	\| 0.2945 \| 5.5468 \| 3500 \| 22.1454 \| 1.1921 \|


	### Framework versions

	- Transformers 4.47.1
	- Pytorch 2.4.0+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	## Linked Models

	- [Whisper-Small-es](https://huggingface.co/Marianoleiras/whisper-small-es): The ASR model of the cascaded approach built using this dataset.
	- [NLLB-200-Distilled-es-ja](https://huggingface.co/Marianoleiras/nllb-200-distilled-es-ja): The MT model of the cascaded approach built using this dataset.

	# Model Card Contact

	Mariano González ([email protected])