invincible-jha's picture
Upload readme.md
122d335 verified
---
title: Vocal Emotion Recognition
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
---
# Vocal Emotion Recognition System
## 🎯 Project Overview
A deep learning-based system for real-time emotion recognition from vocal input using state-of-the-art audio processing and transformer models.
### Key Features
- Real-time vocal emotion analysis
- Advanced audio feature extraction
- Pre-trained transformer model integration
- User-friendly web interface
- Comprehensive evaluation metrics
## πŸ› οΈ Technical Architecture
### Components
1. **Audio Processing Pipeline**
- Sample rate standardization (16kHz)
- Noise reduction and normalization
- Feature extraction (MFCC, Chroma, Mel spectrograms)
2. **Machine Learning Pipeline**
- DistilBERT-based emotion classification
- Transfer learning capabilities
- Comprehensive evaluation metrics
3. **Web Interface**
- Gradio-based interactive UI
- Real-time processing
- Intuitive result visualization
## πŸ“¦ Installation
1. **Clone the Repository**
```bash
git clone [repository-url]
cd vocal-emotion-recognition
```
2. **Install Dependencies**
```bash
pip install -r requirements.txt
```
3. **Environment Setup**
- Python 3.8+ required
- CUDA-compatible GPU recommended for training
- Microphone access required for real-time analysis
## πŸš€ Usage
### Starting the Application
```bash
python app.py
```
- Access the web interface at `http://localhost:7860`
- Use microphone input for real-time analysis
- View emotion classification results instantly
### Training Custom Models
```bash
python model_training.py --data_path [path] --epochs [num]
```
## πŸ“Š Model Performance
The system utilizes various metrics for evaluation:
- Accuracy, Precision, Recall, F1 Score
- ROC-AUC Score
- Confusion Matrix
- MAE and RMSE
## πŸ”§ Configuration
### Model Settings
- Base model: `bhadresh-savani/distilbert-base-uncased-emotion`
- Audio sample rate: 16kHz
- Batch size: 8 (configurable)
- Learning rate: 5e-5
### Feature Extraction
- MFCC: 13 coefficients
- Chroma features
- Mel spectrograms
- Spectral contrast
- Tonnetz features
## πŸ“ API Reference
### Audio Processing
```python
preprocess_audio(audio_file)
extract_features(audio_data)
```
### Model Interface
```python
analyze_emotion(audio_input)
train_model(data_path, epochs)
```
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch
3. Commit changes
4. Push to the branch
5. Open a pull request
## πŸ“„ License
This project is licensed under the MIT License - see the LICENSE file for details.
## πŸ™ Acknowledgments
- HuggingFace Transformers
- Librosa Audio Processing
- Gradio Interface Library
## πŸ“ž Contact
For questions and support, please open an issue in the repository.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference