File size: 4,421 Bytes
416fb0f 766a28e 416fb0f 766a28e 416fb0f 766a28e 9d19e49 416fb0f d7048bf 416fb0f 766a28e 416fb0f d7048bf 416fb0f 766a28e 416fb0f d7048bf 1c4bdb9 d7048bf a50a16d d7048bf a50a16d d7048bf a50a16d d7048bf a50a16d d7048bf 416fb0f 766a28e 416fb0f a50a16d 9d19e49 1c4bdb9 a50a16d 416fb0f 9d19e49 416fb0f 9d19e49 416fb0f 9d19e49 1c4bdb9 416fb0f a50a16d 5150908 a50a16d 416fb0f a50a16d 5150908 a50a16d 416fb0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
language:
- id
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
- magic_data
- TITML
metrics:
- wer
base_model: openai/whisper-medium
model-index:
- name: Whisper Medium Indonesian
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: mozilla-foundation/common_voice_11_0 id
type: mozilla-foundation/common_voice_11_0
config: id
split: test
metrics:
- type: wer
value: 3.8273540533062804
name: Wer
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: google/fleurs id_id
type: google/fleurs
config: id_id
split: test
metrics:
- type: wer
value: 9.74
name: Wer
---
# Whisper Medium Indonesian
This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the
Indonesian mozilla-foundation/common_voice_11_0, magic_data, titml and google/fleurs dataset. It achieves the following
results:
### CV11 test split:
- Loss: 0.0698
- Wer: 3.8274
### Google/fleurs test split:
- Wer: 9.74
## Usage
```python
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="cahya/whisper-medium-id"
)
transcriber.model.config.forced_decoder_ids = (
transcriber.tokenizer.get_decoder_prompt_ids(
language="id"
task="transcribe"
)
)
transcription = transcriber("my_audio_file.mp3")
```
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 10000
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:-----:|:---------------:|:------:|
| 0.0427 | 0.33 | 1000 | 0.0664 | 4.3807 |
| 0.042 | 0.66 | 2000 | 0.0658 | 3.9426 |
| 0.0265 | 0.99 | 3000 | 0.0657 | 3.8274 |
| 0.0211 | 1.32 | 4000 | 0.0679 | 3.8366 |
| 0.0212 | 1.66 | 5000 | 0.0682 | 3.8412 |
| 0.0206 | 1.99 | 6000 | 0.0683 | 3.8689 |
| 0.0166 | 2.32 | 7000 | 0.0711 | 3.9657 |
| 0.0095 | 2.65 | 8000 | 0.0717 | 3.9980 |
| 0.0122 | 2.98 | 9000 | 0.0714 | 3.9795 |
| 0.0049 | 3.31 | 10000 | 0.0720 | 3.9887 |
## Evaluation
We evaluated the model using the test split of two datasets, the [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0)
and the [Google Fleurs](https://huggingface.co/datasets/google/fleurs).
As Whisper can transcribe casing and punctuation, we also evaluate its performance using raw and normalized text.
(lowercase + removal of punctuations). The results are as follows:
### Common Voice 11
| | WER |
|---------------------------------------------------------------------------|------|
| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id) | 3.83 |
| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 12.62 |
### Google/Fleurs
| | WER |
|-------------------------------------------------------------------------------------------------------------|------|
| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id) | 9.74 |
| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id) + text normalization | tbc |
| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 10.2 |
| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) + text normalization | tbc |
|
### Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.7.1.dev0
- Tokenizers 0.13.2
|