End of training
Browse files- README.md +37 -31
- generation_config.json +1 -1
README.md
CHANGED
@@ -1,55 +1,56 @@
|
|
1 |
---
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
base_model: openai/whisper-small
|
4 |
tags:
|
|
|
5 |
- generated_from_trainer
|
6 |
datasets:
|
7 |
-
- common_voice_11_0
|
|
|
|
|
8 |
model-index:
|
9 |
-
- name:
|
10 |
-
results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |
|
13 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
should probably proofread and complete it, then remove this comment. -->
|
15 |
|
16 |
-
#
|
17 |
|
18 |
-
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the
|
|
|
|
|
|
|
19 |
|
20 |
## Model description
|
21 |
|
22 |
-
|
23 |
-
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.
|
24 |
-
|
25 |
-
Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) by Alec Radford et al from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper).
|
26 |
-
|
27 |
-
### Whisper NL (noise)
|
28 |
-
This current Whisper model is specifically fine-tuned on noisy Dutch data. It intends to have increased performance on this task. The generalizing aspects of the model will be lost in this process. It will nevertheless build upon the 680 hours of labeled data the base model already received during training.
|
29 |
|
30 |
## Intended uses & limitations
|
31 |
|
32 |
-
|
33 |
-
|
34 |
-
Noise is often bound to specific contexts and recordings. The model will therefore not generalize to all sorts and types of (car) noise.
|
35 |
|
36 |
## Training and evaluation data
|
37 |
|
38 |
-
|
39 |
|
40 |
## Training procedure
|
41 |
-
The training procedure as outlined in the original [Huggingface blog](https://huggingface.co/blog/fine-tune-whisper) was used, see [here](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb).
|
42 |
-
|
43 |
-
The only alteration was made during preparation of the dataset (`prepare_dataset(batch)`). Within this step the data was [augmented](https://pytorch.org/audio/stable/tutorials/audio_data_augmentation_tutorial.html) to include various samples of background noise. These noises were sampled from various audio files and sources. For each audio track used from the common-voice dataset, a random noise sample was mixed with this audio track. The signal to noise ratio (SNR) varied (randomly) between -5, and 1 dB.
|
44 |
-
|
45 |
-
$$ \mathrm{SNR} = {{P_{signal}} \over {P_{noise}}} $$
|
46 |
-
|
47 |
-
$$ \mathrm{SNR_{dB}} = 10 \log _{10}\mathrm{SNR} $$
|
48 |
-
|
49 |
-
This SNR range makes the noise disturbance, on average, quite invasive.
|
50 |
-
The augmentation, does, however not alter the transcription of the audio track, these remain unchanged.
|
51 |
-
|
52 |
-
Whilst it would be beneficial to add the noise on the fly during training, to enable varying selections of noise for the same audio track, for efficiency reasons this strategy was not applied. Each audio tracks is therefore augmented with a single (randomly selected) noise track. The length of the audio track remained unchanged. If needed the noise track was repeated or truncated to meet the length of the audio track.
|
53 |
|
54 |
### Training hyperparameters
|
55 |
|
@@ -60,16 +61,21 @@ The following hyperparameters were used during training:
|
|
60 |
- seed: 42
|
61 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
62 |
- lr_scheduler_type: linear
|
63 |
-
- lr_scheduler_warmup_steps:
|
64 |
-
- training_steps:
|
65 |
- mixed_precision_training: Native AMP
|
66 |
|
67 |
### Training results
|
68 |
|
|
|
|
|
|
|
|
|
|
|
69 |
|
70 |
### Framework versions
|
71 |
|
72 |
-
- Transformers 4.
|
73 |
- Pytorch 2.1.0+cu121
|
74 |
- Datasets 2.17.1
|
75 |
- Tokenizers 0.15.2
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- nl
|
4 |
license: apache-2.0
|
5 |
base_model: openai/whisper-small
|
6 |
tags:
|
7 |
+
- nl-asr-leaderboard
|
8 |
- generated_from_trainer
|
9 |
datasets:
|
10 |
+
- mozilla-foundation/common_voice_11_0
|
11 |
+
metrics:
|
12 |
+
- wer
|
13 |
model-index:
|
14 |
+
- name: Whisper Small NL - Noise
|
15 |
+
results:
|
16 |
+
- task:
|
17 |
+
name: Automatic Speech Recognition
|
18 |
+
type: automatic-speech-recognition
|
19 |
+
dataset:
|
20 |
+
name: Common Voice 11.0
|
21 |
+
type: mozilla-foundation/common_voice_11_0
|
22 |
+
config: nl
|
23 |
+
split: None
|
24 |
+
args: 'config: nl, split: test'
|
25 |
+
metrics:
|
26 |
+
- name: Wer
|
27 |
+
type: wer
|
28 |
+
value: 38.08532778355879
|
29 |
---
|
30 |
|
31 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
32 |
should probably proofread and complete it, then remove this comment. -->
|
33 |
|
34 |
+
# Whisper Small NL - Noise
|
35 |
|
36 |
+
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 11.0 dataset.
|
37 |
+
It achieves the following results on the evaluation set:
|
38 |
+
- Loss: 0.6359
|
39 |
+
- Wer: 38.0853
|
40 |
|
41 |
## Model description
|
42 |
|
43 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
## Intended uses & limitations
|
46 |
|
47 |
+
More information needed
|
|
|
|
|
48 |
|
49 |
## Training and evaluation data
|
50 |
|
51 |
+
More information needed
|
52 |
|
53 |
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
### Training hyperparameters
|
56 |
|
|
|
61 |
- seed: 42
|
62 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
63 |
- lr_scheduler_type: linear
|
64 |
+
- lr_scheduler_warmup_steps: 200
|
65 |
+
- training_steps: 2000
|
66 |
- mixed_precision_training: Native AMP
|
67 |
|
68 |
### Training results
|
69 |
|
70 |
+
| Training Loss | Epoch | Step | Validation Loss | Wer |
|
71 |
+
|:-------------:|:-----:|:----:|:---------------:|:-------:|
|
72 |
+
| 0.3722 | 0.26 | 1000 | 0.6908 | 39.8543 |
|
73 |
+
| 0.3779 | 0.53 | 2000 | 0.6359 | 38.0853 |
|
74 |
+
|
75 |
|
76 |
### Framework versions
|
77 |
|
78 |
+
- Transformers 4.39.0.dev0
|
79 |
- Pytorch 2.1.0+cu121
|
80 |
- Datasets 2.17.1
|
81 |
- Tokenizers 0.15.2
|
generation_config.json
CHANGED
@@ -261,5 +261,5 @@
|
|
261 |
"transcribe": 50359,
|
262 |
"translate": 50358
|
263 |
},
|
264 |
-
"transformers_version": "4.
|
265 |
}
|
|
|
261 |
"transcribe": 50359,
|
262 |
"translate": 50358
|
263 |
},
|
264 |
+
"transformers_version": "4.39.0.dev0"
|
265 |
}
|