File size: 3,566 Bytes

---
license: cc-by-nc-4.0
language: ddn
metrics:
- wer
tags:
- text-to-audio
- automatic-speech-recognition
- wav2vec2-fine-tuning
- dendi-text-to-speech
model-index:
- name: Dendi Numerals ASR
  results:
  - task: 
      name: Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: dendi
      type: dendi_numbers_dataset
    metrics:
       - name: Test WER
         type: wer
         value: 18.18
pipeline_tag: automatic-speech-recognition
---

# CreaTiv Team (CTT): Dendi Numerals Automatic Speech Recognition

This repository contains an Automatic Speech Recognition (ASR) model specifically for recognizing numerals in the Dendi (ddn) language.
The model can accurately recognize numbers ranging from 0 to 1,000,000,000 when spoken in Dendi.

This model is part of Creativ Team's [Noulinmon](https://noulinmon.baruwuu.bj/) project, a user-friendly mobile app designed to make calculations accessible in six local languages of Benin, featuring voice reading and AI capabilities.
You can find more CTT-ASR models on the Hugging Face Hub: [ssid32/ctt-asr](https://huggingface.co/models?sort=trending&search=ssid32).

CTT-ASR is available in the 🤗 Transformers library from version 4.4 onwards.

## Model Details

The model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Dendi.
When using this model, make sure that your speech input is sampled at 16kHz.


## Usage

To use this model, first install the latest version of 🤗 Transformers library:

```
pip install --upgrade transformers accelerate
```

Then, run inference with the following code-snippet:

```python
import torch
import torchaudio
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

processor = Wav2Vec2Processor.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals") 
model = Wav2Vec2ForCTC.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")

speech_array, sampling_rate = torchaudio.load("audio_test.wav")
speech_array = speech_array.squeeze().numpy()
inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
  logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
  output = processor.batch_decode(torch.argmax(logits, dim=-1))

print("Output:", output)

```



You can listen to the sample audio here:

<audio controls>
  <source src="https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals/resolve/main/audio_test.wav" type="audio/wav">
  Your browser does not support the audio element.
</audio>

Upon processing the sample audio, the model produces the following output:

```
Output: ['zangu ihaaku nda weiguu']
```

In this case, the output represents the numeral **850** in the Dendi language.

### Evaluation result

The model's performance on a test set yields a Word Error Rate (WER) of **18.18**%.

## Authors

This model was developed by:
- Salim KORA GUERA (HuggingFace Username: [ssid32](https://huggingface.co/ssid32)) | ([email protected])
- Etienne TOVIMAFA (HuggingFace Username: [MrBendji](https://huggingface.co/MrBendji)) | ([email protected])

## Citation

```bibtex
@misc {
	author       = { {Salim KORA GUERA and Etienne TOVIMAFA} },
	title        = { wav2vec2-xlsr-dendi-ddn-for-numerals },
	year         = 2024,
	url          = { https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals },
	doi          = { 10.57967/hf/2930 },
	publisher    = { Hugging Face }
}
```

## License

The model is licensed as **CC-BY-NC 4.0**.