File size: 3,334 Bytes
5599c2c
 
 
 
2d43abc
 
5599c2c
 
 
23926f1
 
 
 
 
 
 
5599c2c
 
 
 
 
 
 
dde4de1
a794f21
 
 
 
7916cef
10fd8c5
dde4de1
 
 
 
 
 
0e4c207
784ad75
 
5599c2c
784ad75
6bd3100
5599c2c
 
 
 
 
 
2d43abc
 
5599c2c
2d43abc
5599c2c
 
2d43abc
5599c2c
 
2d43abc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa254ec
2d43abc
 
5599c2c
 
 
 
 
190ab53
 
 
 
 
 
 
848a907
190ab53
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
library_name: transformers
tags:
- generated_from_trainer
metrics:
- bleu
model-index:
- name: whisper-small-es-ja
  results: []
datasets:
- Marianoleiras/voxpopuli_es-ja
language:
- es
- ja
base_model:
- openai/whisper-small
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# whisper-small-es-ja

## Model Overview
This model was developed as part of a workshop organized by Yasmin Moslem, focusing on **speech-to-text pipelines**.
The workshop's primary goal was to enable accurate transcription and translation of spoken source languages into written target languages while learning about end-to-end and cascaded approaches in the process. 

This model represents an **end-to-end solution** for Spanish-to-Japanese speech-to-text (STT) tasks and is a fine-tuned version of OpenAI's Whisper-small, specifically trained on the **[Marianoleiras/voxpopuli_es-ja](https://huggingface.co/datasets/Marianoleiras/voxpopuli_es-ja)** dataset for Spanish-to-Japanese speech-to-text (STT) tasks.

The model achieves performance metrics on the provided dataset:

**Evaluation Set:**
- Loss: **1.1724**
- BLEU: **22.2850**

**Test Set:**
- BLEU: **20.8607**
- ChrF++: **23.3571**
- Comet: **77.6979**

(Baseline evaluation on test set: BLEU 0.4793)

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 3500
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step | Bleu    | Validation Loss |
|:-------------:|:------:|:----:|:-------:|:---------------:|
| 1.5787        | 0.3962 | 250  | 11.6756 | 1.5196          |
| 1.3535        | 0.7924 | 500  | 16.0514 | 1.3470          |
| 1.0658        | 1.1886 | 750  | 17.7743 | 1.2533          |
| 1.0303        | 1.5848 | 1000 | 19.1894 | 1.2046          |
| 0.9893        | 1.9810 | 1250 | 20.1198 | 1.1591          |
| 0.7569        | 2.3772 | 1500 | 21.0054 | 1.1546          |
| 0.7571        | 2.7734 | 1750 | 21.6425 | 1.1378          |
| 0.5557        | 3.1696 | 2000 | 21.7563 | 1.1500          |
| 0.5612        | 3.5658 | 2250 | 21.1391 | 1.1395          |
| 0.5581        | 3.9620 | 2500 | 22.0412 | 1.1343          |
| 0.4144        | 4.3582 | 2750 | 22.2850 | 1.1724          |
| 0.4114        | 4.7544 | 3000 | 22.1925 | 1.1681          |
| 0.3005        | 5.1506 | 3250 | 21.4948 | 1.1947          |
| 0.2945        | 5.5468 | 3500 | 22.1454 | 1.1921          |


### Framework versions

- Transformers 4.47.1
- Pytorch 2.4.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0

## Linked Models

- **[Whisper-Small-es](https://huggingface.co/Marianoleiras/whisper-small-es)**: The ASR model of the cascaded approach built using this dataset.
- **[NLLB-200-Distilled-es-ja](https://huggingface.co/Marianoleiras/nllb-200-distilled-es-ja)**: The MT model of the cascaded approach built using this dataset.

# Model Card Contact

Mariano González ([email protected])