Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- M9and2M/Wolof_ASR_dataset
|
5 |
+
language:
|
6 |
+
- wo
|
7 |
+
metrics:
|
8 |
+
- wer
|
9 |
+
pipeline_tag: automatic-speech-recognition
|
10 |
+
tags:
|
11 |
+
- Wolof
|
12 |
+
- ASR
|
13 |
+
---
|
14 |
+
|
15 |
+
|
16 |
+
# Wolof ASR Model (Based on Whisper-Small) trained with mixed human and machine generated dataset
|
17 |
+
|
18 |
+
## Model Overview
|
19 |
+
|
20 |
+
This repository hosts an Automatic Speech Recognition (ASR) model for the Wolof language, fine-tuned from OpenAI's Whisper-small model. This model aims to provide accurate transcription of Wolof audio data.
|
21 |
+
|
22 |
+
## Model Details
|
23 |
+
|
24 |
+
- **Model Base**: Whisper-small
|
25 |
+
- **Loss**: 0.123
|
26 |
+
- **WER**: 0.16
|
27 |
+
|
28 |
+
## Dataset
|
29 |
+
|
30 |
+
The dataset used for training and evaluating this model is a collection from various sources, ensuring a rich and diverse set of Wolof audio samples. The collection is available in my Hugging Face account is used by keeping only the audios with duration shorter than 6 second. In addition of this dataset, audios from YouTub videos are used to synthetize labeled data. This machine generated dataset is mixed with the training dataset and represents 19 % of the dataset used during the training.
|
31 |
+
|
32 |
+
- **Training Dataset**: 57 hours and 13 hours audio with machine generated transcripts
|
33 |
+
- **Test Dataset**: 10 hours
|
34 |
+
|
35 |
+
For detailed information about the dataset, please refer to the [M9and2M/Wolof_ASR_dataset](https://huggingface.co/datasets/M9and2M/Wolof_ASR_dataset).
|
36 |
+
|
37 |
+
## Training
|
38 |
+
|
39 |
+
The training process was adapted from the code in the [Finetune Wa2vec 2.0 For Speech Recognition](https://github.com/khanld/ASR-Wa2vec-Finetune) written to fine-tune Wav2Vec2.0 for speech recognition. Special thanks to the author, Duy Khanh, Le for providing a robust and flexible training framework.
|
40 |
+
|
41 |
+
The model was trained with the following configuration:
|
42 |
+
|
43 |
+
- **Seed**: 19
|
44 |
+
- **Training Batch Size**: 1
|
45 |
+
- **Gradient Accumulation Steps**: 8
|
46 |
+
- **Number of GPUs**: 2
|
47 |
+
|
48 |
+
### Optimizer : AdamW
|
49 |
+
|
50 |
+
- **Learning Rate**: 1e-7
|
51 |
+
|
52 |
+
### Scheduler: OneCycleLR
|
53 |
+
|
54 |
+
- **Max Learning Rate**: 5e-5
|
55 |
+
|
56 |
+
## Acknowledgements
|
57 |
+
This model was built using OpenAI's [Whisper-small](https://huggingface.co/openai/whisper-small) architecture and fine-tuned with a dataset collected from various sources. Special thanks to the creators and contributors of the dataset.
|
58 |
+
|
59 |
+
|
60 |
+
<!-- ## Citation [optional]
|
61 |
+
|
62 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
63 |
+
|
64 |
+
<!-- **BibTeX:** -->
|
65 |
+
|
66 |
+
<!-- [More Information Needed] -->
|
67 |
+
|
68 |
+
<!-- **APA:** -->
|
69 |
+
|
70 |
+
|
71 |
+
|
72 |
+
## More Information
|
73 |
+
|
74 |
+
This model has been developed in the context of my Master Thesis at ETSIT-UPM, Madrid under the supervision of Prof. Luis A. Hern谩ndez G贸mez.
|
75 |
+
|
76 |
+
|
77 |
+
## Contact
|
78 |
+
|
79 |
+
For any inquiries or questions, please contact [email protected]
|