Spaces:
Runtime error
Runtime error
File size: 2,921 Bytes
122d335 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
title: Vocal Emotion Recognition
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
---
# Vocal Emotion Recognition System
## π― Project Overview
A deep learning-based system for real-time emotion recognition from vocal input using state-of-the-art audio processing and transformer models.
### Key Features
- Real-time vocal emotion analysis
- Advanced audio feature extraction
- Pre-trained transformer model integration
- User-friendly web interface
- Comprehensive evaluation metrics
## π οΈ Technical Architecture
### Components
1. **Audio Processing Pipeline**
- Sample rate standardization (16kHz)
- Noise reduction and normalization
- Feature extraction (MFCC, Chroma, Mel spectrograms)
2. **Machine Learning Pipeline**
- DistilBERT-based emotion classification
- Transfer learning capabilities
- Comprehensive evaluation metrics
3. **Web Interface**
- Gradio-based interactive UI
- Real-time processing
- Intuitive result visualization
## π¦ Installation
1. **Clone the Repository**
```bash
git clone [repository-url]
cd vocal-emotion-recognition
```
2. **Install Dependencies**
```bash
pip install -r requirements.txt
```
3. **Environment Setup**
- Python 3.8+ required
- CUDA-compatible GPU recommended for training
- Microphone access required for real-time analysis
## π Usage
### Starting the Application
```bash
python app.py
```
- Access the web interface at `http://localhost:7860`
- Use microphone input for real-time analysis
- View emotion classification results instantly
### Training Custom Models
```bash
python model_training.py --data_path [path] --epochs [num]
```
## π Model Performance
The system utilizes various metrics for evaluation:
- Accuracy, Precision, Recall, F1 Score
- ROC-AUC Score
- Confusion Matrix
- MAE and RMSE
## π§ Configuration
### Model Settings
- Base model: `bhadresh-savani/distilbert-base-uncased-emotion`
- Audio sample rate: 16kHz
- Batch size: 8 (configurable)
- Learning rate: 5e-5
### Feature Extraction
- MFCC: 13 coefficients
- Chroma features
- Mel spectrograms
- Spectral contrast
- Tonnetz features
## π API Reference
### Audio Processing
```python
preprocess_audio(audio_file)
extract_features(audio_data)
```
### Model Interface
```python
analyze_emotion(audio_input)
train_model(data_path, epochs)
```
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Commit changes
4. Push to the branch
5. Open a pull request
## π License
This project is licensed under the MIT License - see the LICENSE file for details.
## π Acknowledgments
- HuggingFace Transformers
- Librosa Audio Processing
- Gradio Interface Library
## π Contact
For questions and support, please open an issue in the repository.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |