|
--- |
|
language: |
|
- mr |
|
- te |
|
- ml |
|
tags: |
|
- audio-classification |
|
- speech-recognition |
|
- indian-languages |
|
- tensorflow |
|
license: apache-2.0 |
|
--- |
|
|
|
# Language Classifier - Indian Languages (Marathi, Telugu, Malayalam) |
|
|
|
This model classifies audio samples into three Indian languages: Marathi, Telugu, and Malayalam. |
|
|
|
## Model Description |
|
|
|
### Model Architecture |
|
- 1D Convolutional Neural Network (CNN) with the following key components: |
|
- 3 Convolutional blocks with increasing filters (64, 128, 256) |
|
- Batch Normalization and ReLU activation after each convolution |
|
- MaxPooling and Dropout for regularization |
|
- Dense layers with 256 units followed by a Softmax output layer |
|
- Input: Audio features (MFCC + Delta features) |
|
- Output: Language classification probabilities |
|
|
|
### Training Data |
|
The model was trained on: |
|
- Total samples per language: 1000 |
|
- Training: 700 samples |
|
- Validation: 150 samples |
|
- Test: 150 samples |
|
|
|
### Features |
|
- MFCC (Mel-frequency cepstral coefficients) with delta features |
|
- Number of MFCC coefficients: 13 |
|
- Maximum padding length: 174 |
|
- Feature type: MFCC with delta and delta-delta features |
|
|
|
### Training Hyperparameters |
|
- Optimizer: AdamW |
|
- Learning rate: 0.001 |
|
- Batch size: 64 |
|
- Early stopping with patience of 10 |
|
- Learning rate reduction on plateau |
|
- Loss function: Categorical Cross-entropy |
|
|
|
## Performance |
|
|
|
The model achieves strong performance in distinguishing between Marathi, Telugu, and Malayalam speech samples. |
|
|
|
### Intended Use |
|
This model is designed for: |
|
- Language identification in audio samples |
|
- Speech processing applications focusing on Indian languages |
|
- Research and development in multilingual speech systems |
|
|
|
### Limitations |
|
- Limited to three languages: Marathi, Telugu, Malayalam |
|
- Fixed input length requirement |
|
- May not perform well on very noisy audio |
|
- Not suitable for real-time processing without proper preprocessing |
|
|
|
## Usage |
|
|
|
```python |
|
import tensorflow as tf |
|
import numpy as np |
|
import joblib |
|
import json |
|
import librosa |
|
|
|
# Load the model, scaler, and config |
|
model = tf.keras.models.load_model('indic_language_classifier_mtm.keras') |
|
scaler = joblib.load('audio_feature_scaler_mtm.pkl') |
|
with open('config_mtm.json', 'r') as f: |
|
config = json.load(f) |
|
|
|
def extract_features(audio_path, config): |
|
audio, sr = librosa.load(audio_path, sr=None) |
|
mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=config['n_mfcc']) |
|
delta_mfccs = librosa.feature.delta(mfccs) |
|
delta2_mfccs = librosa.feature.delta(mfccs, order=2) |
|
features = np.concatenate((mfccs, delta_mfccs, delta2_mfccs), axis=0) |
|
|
|
# Pad or truncate |
|
if features.shape[1] > config['max_pad_len']: |
|
features = features[:, :config['max_pad_len']] |
|
else: |
|
pad_width = config['max_pad_len'] - features.shape[1] |
|
features = np.pad(features, pad_width=((0, 0), (0, pad_width))) |
|
|
|
return features.T |
|
|