hriteshMaikap commited on
Commit
983f020
·
verified ·
1 Parent(s): ea7a533

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - audio
6
+ - language-identification
7
+ - speech
8
+ - indian-languages
9
+ datasets:
10
+ - hmsolanki/indian-languages-audio-dataset
11
+ metrics:
12
+ - accuracy
13
+ - f1
14
+ ---
15
+
16
+ # Indian Language Identification Model
17
+
18
+ This model identifies the language spoken in an audio clip from a set of 10 Indian languages.
19
+
20
+ ## Model Details
21
+
22
+ - **Model Type:** Audio Language Classifier
23
+ - **Languages Supported:** Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Tamil, Telugu, Urdu
24
+ - **Framework:** PyTorch
25
+ - **Training Dataset:** [Indian Languages Audio Dataset](https://www.kaggle.com/datasets/hmsolanki/indian-languages-audio-dataset/)
26
+ - **Audio Sampling Rate:** 16kHz
27
+
28
+ ## Performance
29
+
30
+ - **Accuracy:** 0.8465
31
+ - **Precision:** 0.8457
32
+ - **Recall:** 0.8465
33
+ - **F1 Score:** 0.8452
34
+
35
+ ## Usage
36
+
37
+ ```python
38
+ import torch
39
+ import torchaudio
40
+ import json
41
+ from transformers import pipeline
42
+
43
+ # Load the model
44
+ pipe = pipeline("audio-classification", model="prithvirajjadhav2266/indian-language-identifier")
45
+
46
+ # Or use it directly
47
+ waveform, sample_rate = torchaudio.load("path/to/audio.wav")
48
+ if sample_rate != 16000:
49
+ resampler = torchaudio.transforms.Resample(sample_rate, 16000)
50
+ waveform = resampler(waveform)
51
+
52
+ # Get prediction
53
+ prediction = pipe(waveform)
54
+ print(f"Detected language: {prediction[0]['label']}")
55
+ ```
56
+
57
+ ## Limitations
58
+
59
+ - Works best with clear audio without background noise
60
+ - Audio should be sampled at 16kHz for optimal performance
61
+
62
+ ## Training Details
63
+
64
+ This model was trained on a dataset of Indian language audio samples. The model architecture combines CNN layers for feature extraction with transformer layers for classification.
65
+
66
+ ## Confusion Matrix
67
+
68
+ ![Confusion Matrix](/confusion_matrix.png)