File size: 3,334 Bytes
5599c2c 2d43abc 5599c2c 23926f1 5599c2c dde4de1 a794f21 7916cef 10fd8c5 dde4de1 0e4c207 784ad75 5599c2c 784ad75 6bd3100 5599c2c 2d43abc 5599c2c 2d43abc 5599c2c 2d43abc 5599c2c 2d43abc aa254ec 2d43abc 5599c2c 190ab53 848a907 190ab53 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
library_name: transformers
tags:
- generated_from_trainer
metrics:
- bleu
model-index:
- name: whisper-small-es-ja
results: []
datasets:
- Marianoleiras/voxpopuli_es-ja
language:
- es
- ja
base_model:
- openai/whisper-small
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# whisper-small-es-ja
## Model Overview
This model was developed as part of a workshop organized by Yasmin Moslem, focusing on **speech-to-text pipelines**.
The workshop's primary goal was to enable accurate transcription and translation of spoken source languages into written target languages while learning about end-to-end and cascaded approaches in the process.
This model represents an **end-to-end solution** for Spanish-to-Japanese speech-to-text (STT) tasks and is a fine-tuned version of OpenAI's Whisper-small, specifically trained on the **[Marianoleiras/voxpopuli_es-ja](https://huggingface.co/datasets/Marianoleiras/voxpopuli_es-ja)** dataset for Spanish-to-Japanese speech-to-text (STT) tasks.
The model achieves performance metrics on the provided dataset:
**Evaluation Set:**
- Loss: **1.1724**
- BLEU: **22.2850**
**Test Set:**
- BLEU: **20.8607**
- ChrF++: **23.3571**
- Comet: **77.6979**
(Baseline evaluation on test set: BLEU 0.4793)
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 3500
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Bleu | Validation Loss |
|:-------------:|:------:|:----:|:-------:|:---------------:|
| 1.5787 | 0.3962 | 250 | 11.6756 | 1.5196 |
| 1.3535 | 0.7924 | 500 | 16.0514 | 1.3470 |
| 1.0658 | 1.1886 | 750 | 17.7743 | 1.2533 |
| 1.0303 | 1.5848 | 1000 | 19.1894 | 1.2046 |
| 0.9893 | 1.9810 | 1250 | 20.1198 | 1.1591 |
| 0.7569 | 2.3772 | 1500 | 21.0054 | 1.1546 |
| 0.7571 | 2.7734 | 1750 | 21.6425 | 1.1378 |
| 0.5557 | 3.1696 | 2000 | 21.7563 | 1.1500 |
| 0.5612 | 3.5658 | 2250 | 21.1391 | 1.1395 |
| 0.5581 | 3.9620 | 2500 | 22.0412 | 1.1343 |
| 0.4144 | 4.3582 | 2750 | 22.2850 | 1.1724 |
| 0.4114 | 4.7544 | 3000 | 22.1925 | 1.1681 |
| 0.3005 | 5.1506 | 3250 | 21.4948 | 1.1947 |
| 0.2945 | 5.5468 | 3500 | 22.1454 | 1.1921 |
### Framework versions
- Transformers 4.47.1
- Pytorch 2.4.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
## Linked Models
- **[Whisper-Small-es](https://huggingface.co/Marianoleiras/whisper-small-es)**: The ASR model of the cascaded approach built using this dataset.
- **[NLLB-200-Distilled-es-ja](https://huggingface.co/Marianoleiras/nllb-200-distilled-es-ja)**: The MT model of the cascaded approach built using this dataset.
# Model Card Contact
Mariano González ([email protected]) |