---
title: Vocal Emotion Recognition
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
---

# Vocal Emotion Recognition System

## 🎯 Project Overview

A deep learning-based system for real-time emotion recognition from vocal input using state-of-the-art audio processing and transformer models.

### Key Features
- Real-time vocal emotion analysis
- Advanced audio feature extraction
- Pre-trained transformer model integration
- User-friendly web interface
- Comprehensive evaluation metrics

## 🛠️ Technical Architecture

### Components
1. **Audio Processing Pipeline**
   - Sample rate standardization (16kHz)
   - Noise reduction and normalization
   - Feature extraction (MFCC, Chroma, Mel spectrograms)

2. **Machine Learning Pipeline**
   - DistilBERT-based emotion classification
   - Transfer learning capabilities
   - Comprehensive evaluation metrics

3. **Web Interface**
   - Gradio-based interactive UI
   - Real-time processing
   - Intuitive result visualization

## 📦 Installation

1. **Clone the Repository**
```bash
git clone [repository-url]
cd vocal-emotion-recognition
```

2. **Install Dependencies**
```bash
pip install -r requirements.txt
```

3. **Environment Setup**
- Python 3.8+ required
- CUDA-compatible GPU recommended for training
- Microphone access required for real-time analysis

## 🚀 Usage

### Starting the Application
```bash
python app.py
```
- Access the web interface at `http://localhost:7860`
- Use microphone input for real-time analysis
- View emotion classification results instantly

### Training Custom Models
```bash
python model_training.py --data_path [path] --epochs [num]
```

## 📊 Model Performance

The system utilizes various metrics for evaluation:
- Accuracy, Precision, Recall, F1 Score
- ROC-AUC Score
- Confusion Matrix
- MAE and RMSE

## 🔧 Configuration

### Model Settings
- Base model: `bhadresh-savani/distilbert-base-uncased-emotion`
- Audio sample rate: 16kHz
- Batch size: 8 (configurable)
- Learning rate: 5e-5

### Feature Extraction
- MFCC: 13 coefficients
- Chroma features
- Mel spectrograms
- Spectral contrast
- Tonnetz features

## 📝 API Reference

### Audio Processing
```python
preprocess_audio(audio_file)
extract_features(audio_data)
```

### Model Interface
```python
analyze_emotion(audio_input)
train_model(data_path, epochs)
```

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Commit changes
4. Push to the branch
5. Open a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🙏 Acknowledgments

- HuggingFace Transformers
- Librosa Audio Processing
- Gradio Interface Library

## 📞 Contact

For questions and support, please open an issue in the repository.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference