hriteshMaikap
/

languageClassifier

Audio Classification

speech-recognition

indian-languages

Model card Files Files and versions Community

languageClassifier / README.md

hriteshMaikap's picture

Upload README.md with huggingface_hub

d3e72d1 verified about 2 months ago

|

history blame contribute delete

2.87 kB

	---
	language:
	- mr
	- te
	- ml
	tags:
	- audio-classification
	- speech-recognition
	- indian-languages
	- tensorflow
	license: apache-2.0
	---

	# Language Classifier - Indian Languages (Marathi, Telugu, Malayalam)

	This model classifies audio samples into three Indian languages: Marathi, Telugu, and Malayalam.

	## Model Description

	### Model Architecture
	- 1D Convolutional Neural Network (CNN) with the following key components:
	- 3 Convolutional blocks with increasing filters (64, 128, 256)
	- Batch Normalization and ReLU activation after each convolution
	- MaxPooling and Dropout for regularization
	- Dense layers with 256 units followed by a Softmax output layer
	- Input: Audio features (MFCC + Delta features)
	- Output: Language classification probabilities

	### Training Data
	The model was trained on:
	- Total samples per language: 1000
	- Training: 700 samples
	- Validation: 150 samples
	- Test: 150 samples

	### Features
	- MFCC (Mel-frequency cepstral coefficients) with delta features
	- Number of MFCC coefficients: 13
	- Maximum padding length: 174
	- Feature type: MFCC with delta and delta-delta features

	### Training Hyperparameters
	- Optimizer: AdamW
	- Learning rate: 0.001
	- Batch size: 64
	- Early stopping with patience of 10
	- Learning rate reduction on plateau
	- Loss function: Categorical Cross-entropy

	## Performance

	The model achieves strong performance in distinguishing between Marathi, Telugu, and Malayalam speech samples.

	### Intended Use
	This model is designed for:
	- Language identification in audio samples
	- Speech processing applications focusing on Indian languages
	- Research and development in multilingual speech systems

	### Limitations
	- Limited to three languages: Marathi, Telugu, Malayalam
	- Fixed input length requirement
	- May not perform well on very noisy audio
	- Not suitable for real-time processing without proper preprocessing

	## Usage

	```python
	import tensorflow as tf
	import numpy as np
	import joblib
	import json
	import librosa

	# Load the model, scaler, and config
	model = tf.keras.models.load_model('indic_language_classifier_mtm.keras')
	scaler = joblib.load('audio_feature_scaler_mtm.pkl')
	with open('config_mtm.json', 'r') as f:
	config = json.load(f)

	def extract_features(audio_path, config):
	audio, sr = librosa.load(audio_path, sr=None)
	mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=config['n_mfcc'])
	delta_mfccs = librosa.feature.delta(mfccs)
	delta2_mfccs = librosa.feature.delta(mfccs, order=2)
	features = np.concatenate((mfccs, delta_mfccs, delta2_mfccs), axis=0)

	# Pad or truncate
	if features.shape[1] > config['max_pad_len']:
	features = features[:, :config['max_pad_len']]
	else:
	pad_width = config['max_pad_len'] - features.shape[1]
	features = np.pad(features, pad_width=((0, 0), (0, pad_width)))

	return features.T