M9and2M commited on
Commit
3192bb0
verified
1 Parent(s): 06d863d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - M9and2M/Wolof_ASR_dataset
5
+ language:
6
+ - wo
7
+ metrics:
8
+ - wer
9
+ pipeline_tag: automatic-speech-recognition
10
+ tags:
11
+ - Wolof
12
+ - ASR
13
+ ---
14
+
15
+
16
+ # Wolof ASR Model (Based on Whisper-Small) trained with mixed human and machine generated dataset
17
+
18
+ ## Model Overview
19
+
20
+ This repository hosts an Automatic Speech Recognition (ASR) model for the Wolof language, fine-tuned from OpenAI's Whisper-small model. This model aims to provide accurate transcription of Wolof audio data.
21
+
22
+ ## Model Details
23
+
24
+ - **Model Base**: Whisper-small
25
+ - **Loss**: 0.123
26
+ - **WER**: 0.16
27
+
28
+ ## Dataset
29
+
30
+ The dataset used for training and evaluating this model is a collection from various sources, ensuring a rich and diverse set of Wolof audio samples. The collection is available in my Hugging Face account is used by keeping only the audios with duration shorter than 6 second. In addition of this dataset, audios from YouTub videos are used to synthetize labeled data. This machine generated dataset is mixed with the training dataset and represents 19 % of the dataset used during the training.
31
+
32
+ - **Training Dataset**: 57 hours and 13 hours audio with machine generated transcripts
33
+ - **Test Dataset**: 10 hours
34
+
35
+ For detailed information about the dataset, please refer to the [M9and2M/Wolof_ASR_dataset](https://huggingface.co/datasets/M9and2M/Wolof_ASR_dataset).
36
+
37
+ ## Training
38
+
39
+ The training process was adapted from the code in the [Finetune Wa2vec 2.0 For Speech Recognition](https://github.com/khanld/ASR-Wa2vec-Finetune) written to fine-tune Wav2Vec2.0 for speech recognition. Special thanks to the author, Duy Khanh, Le for providing a robust and flexible training framework.
40
+
41
+ The model was trained with the following configuration:
42
+
43
+ - **Seed**: 19
44
+ - **Training Batch Size**: 1
45
+ - **Gradient Accumulation Steps**: 8
46
+ - **Number of GPUs**: 2
47
+
48
+ ### Optimizer : AdamW
49
+
50
+ - **Learning Rate**: 1e-7
51
+
52
+ ### Scheduler: OneCycleLR
53
+
54
+ - **Max Learning Rate**: 5e-5
55
+
56
+ ## Acknowledgements
57
+ This model was built using OpenAI's [Whisper-small](https://huggingface.co/openai/whisper-small) architecture and fine-tuned with a dataset collected from various sources. Special thanks to the creators and contributors of the dataset.
58
+
59
+
60
+ <!-- ## Citation [optional]
61
+
62
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
63
+
64
+ <!-- **BibTeX:** -->
65
+
66
+ <!-- [More Information Needed] -->
67
+
68
+ <!-- **APA:** -->
69
+
70
+
71
+
72
+ ## More Information
73
+
74
+ This model has been developed in the context of my Master Thesis at ETSIT-UPM, Madrid under the supervision of Prof. Luis A. Hern谩ndez G贸mez.
75
+
76
+
77
+ ## Contact
78
+
79
+ For any inquiries or questions, please contact [email protected]