File size: 2,921 Bytes
122d335
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: Vocal Emotion Recognition
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
---

# Vocal Emotion Recognition System

## 🎯 Project Overview

A deep learning-based system for real-time emotion recognition from vocal input using state-of-the-art audio processing and transformer models.

### Key Features
- Real-time vocal emotion analysis
- Advanced audio feature extraction
- Pre-trained transformer model integration
- User-friendly web interface
- Comprehensive evaluation metrics

## πŸ› οΈ Technical Architecture

### Components
1. **Audio Processing Pipeline**
   - Sample rate standardization (16kHz)
   - Noise reduction and normalization
   - Feature extraction (MFCC, Chroma, Mel spectrograms)

2. **Machine Learning Pipeline**
   - DistilBERT-based emotion classification
   - Transfer learning capabilities
   - Comprehensive evaluation metrics

3. **Web Interface**
   - Gradio-based interactive UI
   - Real-time processing
   - Intuitive result visualization

## πŸ“¦ Installation

1. **Clone the Repository**
```bash
git clone [repository-url]
cd vocal-emotion-recognition
```

2. **Install Dependencies**
```bash
pip install -r requirements.txt
```

3. **Environment Setup**
- Python 3.8+ required
- CUDA-compatible GPU recommended for training
- Microphone access required for real-time analysis

## πŸš€ Usage

### Starting the Application
```bash
python app.py
```
- Access the web interface at `http://localhost:7860`
- Use microphone input for real-time analysis
- View emotion classification results instantly

### Training Custom Models
```bash
python model_training.py --data_path [path] --epochs [num]
```

## πŸ“Š Model Performance

The system utilizes various metrics for evaluation:
- Accuracy, Precision, Recall, F1 Score
- ROC-AUC Score
- Confusion Matrix
- MAE and RMSE

## πŸ”§ Configuration

### Model Settings
- Base model: `bhadresh-savani/distilbert-base-uncased-emotion`
- Audio sample rate: 16kHz
- Batch size: 8 (configurable)
- Learning rate: 5e-5

### Feature Extraction
- MFCC: 13 coefficients
- Chroma features
- Mel spectrograms
- Spectral contrast
- Tonnetz features

## πŸ“ API Reference

### Audio Processing
```python
preprocess_audio(audio_file)
extract_features(audio_data)
```

### Model Interface
```python
analyze_emotion(audio_input)
train_model(data_path, epochs)
```

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Commit changes
4. Push to the branch
5. Open a pull request

## πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

## πŸ™ Acknowledgments

- HuggingFace Transformers
- Librosa Audio Processing
- Gradio Interface Library

## πŸ“ž Contact

For questions and support, please open an issue in the repository.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference