Update README.md
Browse files
README.md
CHANGED
@@ -21,6 +21,16 @@ This model has been trained on a unique dataset derived from parsed audio and te
|
|
21 |
|
22 |
This model represents an initial endeavor in the journey of developing transcription models specifically for indigenous languages. The creation and improvement of such models have profound societal implications. It not only helps in preserving and promoting indigenous languages but also serves as a valuable asset for linguistic studies, helping scholars and communities alike in understanding and promoting the rich cultural tapestry of indigenous languages.
|
23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
# Model Accuracy Warning
|
25 |
|
26 |
While this model has shown promising results, it's essential to be aware of its limitations:
|
|
|
21 |
|
22 |
This model represents an initial endeavor in the journey of developing transcription models specifically for indigenous languages. The creation and improvement of such models have profound societal implications. It not only helps in preserving and promoting indigenous languages but also serves as a valuable asset for linguistic studies, helping scholars and communities alike in understanding and promoting the rich cultural tapestry of indigenous languages.
|
23 |
|
24 |
+
## Dataset Details
|
25 |
+
|
26 |
+
The dataset consists of 1,835 audio recordings, each accompanied by its respective transcription. The lexical corpus encompasses approximately 3,000 unique words.
|
27 |
+
|
28 |
+
- **Total Audio Duration**: 6241.65 seconds (approximately 1.7 hours)
|
29 |
+
- **Average Audio Duration**: 3.41 seconds
|
30 |
+
|
31 |
+
This collection of data serves as a foundational resource for understanding and processing the Wayuunaiki language.
|
32 |
+
|
33 |
+
|
34 |
# Model Accuracy Warning
|
35 |
|
36 |
While this model has shown promising results, it's essential to be aware of its limitations:
|