Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- audio
|
6 |
+
- language-identification
|
7 |
+
- speech
|
8 |
+
- indian-languages
|
9 |
+
datasets:
|
10 |
+
- hmsolanki/indian-languages-audio-dataset
|
11 |
+
metrics:
|
12 |
+
- accuracy
|
13 |
+
- f1
|
14 |
+
---
|
15 |
+
|
16 |
+
# Indian Language Identification Model
|
17 |
+
|
18 |
+
This model identifies the language spoken in an audio clip from a set of 10 Indian languages.
|
19 |
+
|
20 |
+
## Model Details
|
21 |
+
|
22 |
+
- **Model Type:** Audio Language Classifier
|
23 |
+
- **Languages Supported:** Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Tamil, Telugu, Urdu
|
24 |
+
- **Framework:** PyTorch
|
25 |
+
- **Training Dataset:** [Indian Languages Audio Dataset](https://www.kaggle.com/datasets/hmsolanki/indian-languages-audio-dataset/)
|
26 |
+
- **Audio Sampling Rate:** 16kHz
|
27 |
+
|
28 |
+
## Performance
|
29 |
+
|
30 |
+
- **Accuracy:** 0.8465
|
31 |
+
- **Precision:** 0.8457
|
32 |
+
- **Recall:** 0.8465
|
33 |
+
- **F1 Score:** 0.8452
|
34 |
+
|
35 |
+
## Usage
|
36 |
+
|
37 |
+
```python
|
38 |
+
import torch
|
39 |
+
import torchaudio
|
40 |
+
import json
|
41 |
+
from transformers import pipeline
|
42 |
+
|
43 |
+
# Load the model
|
44 |
+
pipe = pipeline("audio-classification", model="prithvirajjadhav2266/indian-language-identifier")
|
45 |
+
|
46 |
+
# Or use it directly
|
47 |
+
waveform, sample_rate = torchaudio.load("path/to/audio.wav")
|
48 |
+
if sample_rate != 16000:
|
49 |
+
resampler = torchaudio.transforms.Resample(sample_rate, 16000)
|
50 |
+
waveform = resampler(waveform)
|
51 |
+
|
52 |
+
# Get prediction
|
53 |
+
prediction = pipe(waveform)
|
54 |
+
print(f"Detected language: {prediction[0]['label']}")
|
55 |
+
```
|
56 |
+
|
57 |
+
## Limitations
|
58 |
+
|
59 |
+
- Works best with clear audio without background noise
|
60 |
+
- Audio should be sampled at 16kHz for optimal performance
|
61 |
+
|
62 |
+
## Training Details
|
63 |
+
|
64 |
+
This model was trained on a dataset of Indian language audio samples. The model architecture combines CNN layers for feature extraction with transformer layers for classification.
|
65 |
+
|
66 |
+
## Confusion Matrix
|
67 |
+
|
68 |
+

|