File size: 2,679 Bytes
3192bb0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
license: mit
datasets:
- M9and2M/Wolof_ASR_dataset
language:
- wo
metrics:
- wer
pipeline_tag: automatic-speech-recognition
tags:
- Wolof
- ASR
---
# Wolof ASR Model (Based on Whisper-Small) trained with mixed human and machine generated dataset
## Model Overview
This repository hosts an Automatic Speech Recognition (ASR) model for the Wolof language, fine-tuned from OpenAI's Whisper-small model. This model aims to provide accurate transcription of Wolof audio data.
## Model Details
- **Model Base**: Whisper-small
- **Loss**: 0.123
- **WER**: 0.16
## Dataset
The dataset used for training and evaluating this model is a collection from various sources, ensuring a rich and diverse set of Wolof audio samples. The collection is available in my Hugging Face account is used by keeping only the audios with duration shorter than 6 second. In addition of this dataset, audios from YouTub videos are used to synthetize labeled data. This machine generated dataset is mixed with the training dataset and represents 19 % of the dataset used during the training.
- **Training Dataset**: 57 hours and 13 hours audio with machine generated transcripts
- **Test Dataset**: 10 hours
For detailed information about the dataset, please refer to the [M9and2M/Wolof_ASR_dataset](https://huggingface.co/datasets/M9and2M/Wolof_ASR_dataset).
## Training
The training process was adapted from the code in the [Finetune Wa2vec 2.0 For Speech Recognition](https://github.com/khanld/ASR-Wa2vec-Finetune) written to fine-tune Wav2Vec2.0 for speech recognition. Special thanks to the author, Duy Khanh, Le for providing a robust and flexible training framework.
The model was trained with the following configuration:
- **Seed**: 19
- **Training Batch Size**: 1
- **Gradient Accumulation Steps**: 8
- **Number of GPUs**: 2
### Optimizer : AdamW
- **Learning Rate**: 1e-7
### Scheduler: OneCycleLR
- **Max Learning Rate**: 5e-5
## Acknowledgements
This model was built using OpenAI's [Whisper-small](https://huggingface.co/openai/whisper-small) architecture and fine-tuned with a dataset collected from various sources. Special thanks to the creators and contributors of the dataset.
<!-- ## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
<!-- **BibTeX:** -->
<!-- [More Information Needed] -->
<!-- **APA:** -->
## More Information
This model has been developed in the context of my Master Thesis at ETSIT-UPM, Madrid under the supervision of Prof. Luis A. Hernández Gómez.
## Contact
For any inquiries or questions, please contact [email protected] |