HHoofs commited on
Commit
6bd9e07
·
verified ·
1 Parent(s): c914ab0

End of training

Browse files
Files changed (2) hide show
  1. README.md +37 -31
  2. generation_config.json +1 -1
README.md CHANGED
@@ -1,55 +1,56 @@
1
  ---
 
 
2
  license: apache-2.0
3
  base_model: openai/whisper-small
4
  tags:
 
5
  - generated_from_trainer
6
  datasets:
7
- - common_voice_11_0
 
 
8
  model-index:
9
- - name: whisper-nl-noise
10
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # whisper-nl-noise
17
 
18
- This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the the (dutch) [mozilla/common-voice](https://commonvoice.mozilla.org/en/datasets) dataset (11.0). This dataset is augmented with various forms of background noise, retrieved from [pixabay](https://pixabay.com/sound-effects/search/car/)
 
 
 
19
 
20
  ## Model description
21
 
22
- ### Whisper (base)
23
- Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.
24
-
25
- Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) by Alec Radford et al from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper).
26
-
27
- ### Whisper NL (noise)
28
- This current Whisper model is specifically fine-tuned on noisy Dutch data. It intends to have increased performance on this task. The generalizing aspects of the model will be lost in this process. It will nevertheless build upon the 680 hours of labeled data the base model already received during training.
29
 
30
  ## Intended uses & limitations
31
 
32
- This model is specifically trained for (very) noise (Dutch) audio. It is expected that is performs worse on audio files which do not meet these criteria.
33
-
34
- Noise is often bound to specific contexts and recordings. The model will therefore not generalize to all sorts and types of (car) noise.
35
 
36
  ## Training and evaluation data
37
 
38
- The [mozilla/common-voice](https://commonvoice.mozilla.org/en/datasets) dataset (11.0) was used. With the predefined 'train' and 'test' split. For reasons of time-management, only the first 5% of the test set was used.
39
 
40
  ## Training procedure
41
- The training procedure as outlined in the original [Huggingface blog](https://huggingface.co/blog/fine-tune-whisper) was used, see [here](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb).
42
-
43
- The only alteration was made during preparation of the dataset (`prepare_dataset(batch)`). Within this step the data was [augmented](https://pytorch.org/audio/stable/tutorials/audio_data_augmentation_tutorial.html) to include various samples of background noise. These noises were sampled from various audio files and sources. For each audio track used from the common-voice dataset, a random noise sample was mixed with this audio track. The signal to noise ratio (SNR) varied (randomly) between -5, and 1 dB.
44
-
45
- $$ \mathrm{SNR} = {{P_{signal}} \over {P_{noise}}} $$
46
-
47
- $$ \mathrm{SNR_{dB}} = 10 \log _{10}\mathrm{SNR} $$
48
-
49
- This SNR range makes the noise disturbance, on average, quite invasive.
50
- The augmentation, does, however not alter the transcription of the audio track, these remain unchanged.
51
-
52
- Whilst it would be beneficial to add the noise on the fly during training, to enable varying selections of noise for the same audio track, for efficiency reasons this strategy was not applied. Each audio tracks is therefore augmented with a single (randomly selected) noise track. The length of the audio track remained unchanged. If needed the noise track was repeated or truncated to meet the length of the audio track.
53
 
54
  ### Training hyperparameters
55
 
@@ -60,16 +61,21 @@ The following hyperparameters were used during training:
60
  - seed: 42
61
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
  - lr_scheduler_type: linear
63
- - lr_scheduler_warmup_steps: 2
64
- - training_steps: 20
65
  - mixed_precision_training: Native AMP
66
 
67
  ### Training results
68
 
 
 
 
 
 
69
 
70
  ### Framework versions
71
 
72
- - Transformers 4.37.2
73
  - Pytorch 2.1.0+cu121
74
  - Datasets 2.17.1
75
  - Tokenizers 0.15.2
 
1
  ---
2
+ language:
3
+ - nl
4
  license: apache-2.0
5
  base_model: openai/whisper-small
6
  tags:
7
+ - nl-asr-leaderboard
8
  - generated_from_trainer
9
  datasets:
10
+ - mozilla-foundation/common_voice_11_0
11
+ metrics:
12
+ - wer
13
  model-index:
14
+ - name: Whisper Small NL - Noise
15
+ results:
16
+ - task:
17
+ name: Automatic Speech Recognition
18
+ type: automatic-speech-recognition
19
+ dataset:
20
+ name: Common Voice 11.0
21
+ type: mozilla-foundation/common_voice_11_0
22
+ config: nl
23
+ split: None
24
+ args: 'config: nl, split: test'
25
+ metrics:
26
+ - name: Wer
27
+ type: wer
28
+ value: 38.08532778355879
29
  ---
30
 
31
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
32
  should probably proofread and complete it, then remove this comment. -->
33
 
34
+ # Whisper Small NL - Noise
35
 
36
+ This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 11.0 dataset.
37
+ It achieves the following results on the evaluation set:
38
+ - Loss: 0.6359
39
+ - Wer: 38.0853
40
 
41
  ## Model description
42
 
43
+ More information needed
 
 
 
 
 
 
44
 
45
  ## Intended uses & limitations
46
 
47
+ More information needed
 
 
48
 
49
  ## Training and evaluation data
50
 
51
+ More information needed
52
 
53
  ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ### Training hyperparameters
56
 
 
61
  - seed: 42
62
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
63
  - lr_scheduler_type: linear
64
+ - lr_scheduler_warmup_steps: 200
65
+ - training_steps: 2000
66
  - mixed_precision_training: Native AMP
67
 
68
  ### Training results
69
 
70
+ | Training Loss | Epoch | Step | Validation Loss | Wer |
71
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|
72
+ | 0.3722 | 0.26 | 1000 | 0.6908 | 39.8543 |
73
+ | 0.3779 | 0.53 | 2000 | 0.6359 | 38.0853 |
74
+
75
 
76
  ### Framework versions
77
 
78
+ - Transformers 4.39.0.dev0
79
  - Pytorch 2.1.0+cu121
80
  - Datasets 2.17.1
81
  - Tokenizers 0.15.2
generation_config.json CHANGED
@@ -261,5 +261,5 @@
261
  "transcribe": 50359,
262
  "translate": 50358
263
  },
264
- "transformers_version": "4.37.2"
265
  }
 
261
  "transcribe": 50359,
262
  "translate": 50358
263
  },
264
+ "transformers_version": "4.39.0.dev0"
265
  }