hriteshMaikap commited on
Commit
d3e72d1
·
verified ·
1 Parent(s): 16bc4c2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +81 -54
README.md CHANGED
@@ -1,68 +1,95 @@
1
  ---
2
  language:
3
- - en
 
 
4
  tags:
5
- - audio
6
- - language-identification
7
- - speech
8
- - indian-languages
9
- datasets:
10
- - hmsolanki/indian-languages-audio-dataset
11
- metrics:
12
- - accuracy
13
- - f1
14
  ---
15
 
16
- # Indian Language Identification Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- This model identifies the language spoken in an audio clip from a set of 10 Indian languages.
19
-
20
- ## Model Details
21
 
22
- - **Model Type:** Audio Language Classifier
23
- - **Languages Supported:** Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Tamil, Telugu, Urdu
24
- - **Framework:** PyTorch
25
- - **Training Dataset:** [Indian Languages Audio Dataset](https://www.kaggle.com/datasets/hmsolanki/indian-languages-audio-dataset/)
26
- - **Audio Sampling Rate:** 16kHz
27
 
28
- ## Performance
 
 
 
 
29
 
30
- - **Accuracy:** 0.8465
31
- - **Precision:** 0.8457
32
- - **Recall:** 0.8465
33
- - **F1 Score:** 0.8452
 
34
 
35
  ## Usage
36
 
37
  ```python
38
- import torch
39
- import torchaudio
 
40
  import json
41
- from transformers import pipeline
42
-
43
- # Load the model
44
- pipe = pipeline("audio-classification", model="hriteshMaikap/languageClassifier")
45
-
46
- # Or use it directly
47
- waveform, sample_rate = torchaudio.load("path/to/audio.wav")
48
- if sample_rate != 16000:
49
- resampler = torchaudio.transforms.Resample(sample_rate, 16000)
50
- waveform = resampler(waveform)
51
-
52
- # Get prediction
53
- prediction = pipe(waveform)
54
- print(f"Detected language: {prediction[0]['label']}")
55
- ```
56
-
57
- ## Limitations
58
-
59
- - Works best with clear audio without background noise
60
- - Audio should be sampled at 16kHz for optimal performance
61
-
62
- ## Training Details
63
-
64
- This model was trained on a dataset of Indian language audio samples. The model architecture combines CNN layers for feature extraction with transformer layers for classification.
65
-
66
- ## Confusion Matrix
67
-
68
- ![Confusion Matrix](/confusion_matrix.png)
 
1
  ---
2
  language:
3
+ - mr
4
+ - te
5
+ - ml
6
  tags:
7
+ - audio-classification
8
+ - speech-recognition
9
+ - indian-languages
10
+ - tensorflow
11
+ license: apache-2.0
 
 
 
 
12
  ---
13
 
14
+ # Language Classifier - Indian Languages (Marathi, Telugu, Malayalam)
15
+
16
+ This model classifies audio samples into three Indian languages: Marathi, Telugu, and Malayalam.
17
+
18
+ ## Model Description
19
+
20
+ ### Model Architecture
21
+ - 1D Convolutional Neural Network (CNN) with the following key components:
22
+ - 3 Convolutional blocks with increasing filters (64, 128, 256)
23
+ - Batch Normalization and ReLU activation after each convolution
24
+ - MaxPooling and Dropout for regularization
25
+ - Dense layers with 256 units followed by a Softmax output layer
26
+ - Input: Audio features (MFCC + Delta features)
27
+ - Output: Language classification probabilities
28
+
29
+ ### Training Data
30
+ The model was trained on:
31
+ - Total samples per language: 1000
32
+ - Training: 700 samples
33
+ - Validation: 150 samples
34
+ - Test: 150 samples
35
+
36
+ ### Features
37
+ - MFCC (Mel-frequency cepstral coefficients) with delta features
38
+ - Number of MFCC coefficients: 13
39
+ - Maximum padding length: 174
40
+ - Feature type: MFCC with delta and delta-delta features
41
+
42
+ ### Training Hyperparameters
43
+ - Optimizer: AdamW
44
+ - Learning rate: 0.001
45
+ - Batch size: 64
46
+ - Early stopping with patience of 10
47
+ - Learning rate reduction on plateau
48
+ - Loss function: Categorical Cross-entropy
49
 
50
+ ## Performance
 
 
51
 
52
+ The model achieves strong performance in distinguishing between Marathi, Telugu, and Malayalam speech samples.
 
 
 
 
53
 
54
+ ### Intended Use
55
+ This model is designed for:
56
+ - Language identification in audio samples
57
+ - Speech processing applications focusing on Indian languages
58
+ - Research and development in multilingual speech systems
59
 
60
+ ### Limitations
61
+ - Limited to three languages: Marathi, Telugu, Malayalam
62
+ - Fixed input length requirement
63
+ - May not perform well on very noisy audio
64
+ - Not suitable for real-time processing without proper preprocessing
65
 
66
  ## Usage
67
 
68
  ```python
69
+ import tensorflow as tf
70
+ import numpy as np
71
+ import joblib
72
  import json
73
+ import librosa
74
+
75
+ # Load the model, scaler, and config
76
+ model = tf.keras.models.load_model('indic_language_classifier_mtm.keras')
77
+ scaler = joblib.load('audio_feature_scaler_mtm.pkl')
78
+ with open('config_mtm.json', 'r') as f:
79
+ config = json.load(f)
80
+
81
+ def extract_features(audio_path, config):
82
+ audio, sr = librosa.load(audio_path, sr=None)
83
+ mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=config['n_mfcc'])
84
+ delta_mfccs = librosa.feature.delta(mfccs)
85
+ delta2_mfccs = librosa.feature.delta(mfccs, order=2)
86
+ features = np.concatenate((mfccs, delta_mfccs, delta2_mfccs), axis=0)
87
+
88
+ # Pad or truncate
89
+ if features.shape[1] > config['max_pad_len']:
90
+ features = features[:, :config['max_pad_len']]
91
+ else:
92
+ pad_width = config['max_pad_len'] - features.shape[1]
93
+ features = np.pad(features, pad_width=((0, 0), (0, pad_width)))
94
+
95
+ return features.T