File size: 4,534 Bytes
cf01bb9
 
 
 
 
 
 
 
 
 
 
 
 
164813e
c075bb2
 
 
 
 
 
 
 
 
8ef83b5
9a670f7
 
 
 
 
 
 
 
d1387fe
8ef83b5
9a670f7
c075bb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
614ee13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: afl-3.0
datasets:
- orkidea/wayuu_CO_train
language:
- guc
metrics:
- wer
pipeline_tag: automatic-speech-recognition
finetuned_from": openai/whisper-small
tags:
- trascript
- ASR
- wayuunaiki
---
# Model Background

This model has been trained on a unique dataset derived from parsed audio and textual data. It's important to note that the dataset originates from recordings and transcriptions of the Bible in **Wayuunaiki**. Due to proprietary restrictions, the dataset cannot be shared publicly.

**Wayuunaiki** is the native language of the Wayuu people, predominantly spoken by communities in Colombia and Venezuela. It's a part of the larger Arawakan language family. In the present day, there are a significant number of speakers in both Colombia and Venezuela, making it one of the more widely spoken indigenous languages in the region.

This model represents an initial endeavor in the journey of developing transcription models specifically for indigenous languages. The creation and improvement of such models have profound societal implications. It not only helps in preserving and promoting indigenous languages but also serves as a valuable asset for linguistic studies, helping scholars and communities alike in understanding and promoting the rich cultural tapestry of indigenous languages.

## Training Dataset Details

The dataset consists of 1,835 audio recordings, each accompanied by its respective transcription. The lexical corpus encompasses approximately 3,000 unique words. 

- **Total Audio Duration**: 6241.65 seconds (approximately 1.7 hours)
- **Average Audio Duration**: 3.41 seconds 

This collection of data serves as a foundational resource for understanding and processing the Wayuunaiki language.

**[The test dataset](https://huggingface.co/datasets/orkidea/wayuu_CO_test)** can be used under the principles of '[fair use](https://en.wikipedia.org/wiki/Fair_use)' copyright.


# Model Accuracy Warning

While this model has shown promising results, it's essential to be aware of its limitations:

- Based on the training and validation data, the model has a Word Error Rate (WER) of around 36%. This indicates that while it can capture the essence of most spoken content, errors can still occur.
  
- The model particularly struggles with long vowels, leading to occasional transcription inaccuracies in such instances.

- This iteration serves as a starting point and can be instrumental in refining future models. It is efficient in capturing the bulk of words, but like any machine learning model, it's not infallible.

**Recommendation**: Any transcription produced by this model should undergo subsequent validation and correction to ensure accuracy. This model is an excellent tool for initial drafts but must be used judiciously.

# Test it yourself

| Transcription | Audio Link |
|---------------|------------|
| iseeichi chi wayuu aneekünakai nütüma Maleiwa süpüla nuꞌutünajachin aaꞌin süpüla nülaꞌajaainjatüin saainjala wayuu süpüshua sainküin mmakat | [Listen here](https://storage.googleapis.com/audio-guc/audio/test/85.wav) |
| maa akaapüꞌü tü anneerü oꞌutünapüꞌükat aaꞌin watüma wayakana judíokana shiiꞌiree sülaꞌajaanüin waainjala | [Listen here](https://storage.googleapis.com/audio-guc/audio/test/86.wav) |

The table provides sample transcriptions alongside their corresponding audio links. These examples give users an opportunity to listen to the audios and evaluate the transcription performance of the model firsthand. By exploring these samples, users can better understand the strengths and potential areas of refinement for the model, especially concerning specific nuances in the Wayuunaiki language.


# Model Description

This model is a speech recognition system trained on a dataset to transcribe audio into text. The model underwent training for 4,000 steps, achieving remarkable improvements in loss metrics during its training journey.

## Training Statistics

- **Initial Training Loss (Step 1000):** 0.016
- **Final Training Loss (Step 4000):** 0.000200
- **Average Training Loss:** 0.161

## Validation Statistics (at the end of training)

- **Validation Loss:** 0.567
- **Word Error Rate (WER):** 36.3%

## Performance Metrics

- **Training Runtime:** 13,696.0441 seconds
- **Samples Processed Per Second:** 4.673
- **Steps Processed Per Second:** 0.292

The model demonstrated promising potential with a consistent reduction in the training loss and a competitive Word Error Rate (WER) during validation.