Spaces:
Runtime error
Runtime error
title: Vocal Emotion Recognition | |
emoji: π€ | |
colorFrom: blue | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 3.50.2 | |
app_file: app.py | |
pinned: false | |
# Vocal Emotion Recognition System | |
## π― Project Overview | |
A deep learning-based system for real-time emotion recognition from vocal input using state-of-the-art audio processing and transformer models. | |
### Key Features | |
- Real-time vocal emotion analysis | |
- Advanced audio feature extraction | |
- Pre-trained transformer model integration | |
- User-friendly web interface | |
- Comprehensive evaluation metrics | |
## π οΈ Technical Architecture | |
### Components | |
1. **Audio Processing Pipeline** | |
- Sample rate standardization (16kHz) | |
- Noise reduction and normalization | |
- Feature extraction (MFCC, Chroma, Mel spectrograms) | |
2. **Machine Learning Pipeline** | |
- DistilBERT-based emotion classification | |
- Transfer learning capabilities | |
- Comprehensive evaluation metrics | |
3. **Web Interface** | |
- Gradio-based interactive UI | |
- Real-time processing | |
- Intuitive result visualization | |
## π¦ Installation | |
1. **Clone the Repository** | |
```bash | |
git clone [repository-url] | |
cd vocal-emotion-recognition | |
``` | |
2. **Install Dependencies** | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. **Environment Setup** | |
- Python 3.8+ required | |
- CUDA-compatible GPU recommended for training | |
- Microphone access required for real-time analysis | |
## π Usage | |
### Starting the Application | |
```bash | |
python app.py | |
``` | |
- Access the web interface at `http://localhost:7860` | |
- Use microphone input for real-time analysis | |
- View emotion classification results instantly | |
### Training Custom Models | |
```bash | |
python model_training.py --data_path [path] --epochs [num] | |
``` | |
## π Model Performance | |
The system utilizes various metrics for evaluation: | |
- Accuracy, Precision, Recall, F1 Score | |
- ROC-AUC Score | |
- Confusion Matrix | |
- MAE and RMSE | |
## π§ Configuration | |
### Model Settings | |
- Base model: `bhadresh-savani/distilbert-base-uncased-emotion` | |
- Audio sample rate: 16kHz | |
- Batch size: 8 (configurable) | |
- Learning rate: 5e-5 | |
### Feature Extraction | |
- MFCC: 13 coefficients | |
- Chroma features | |
- Mel spectrograms | |
- Spectral contrast | |
- Tonnetz features | |
## π API Reference | |
### Audio Processing | |
```python | |
preprocess_audio(audio_file) | |
extract_features(audio_data) | |
``` | |
### Model Interface | |
```python | |
analyze_emotion(audio_input) | |
train_model(data_path, epochs) | |
``` | |
## π€ Contributing | |
1. Fork the repository | |
2. Create a feature branch | |
3. Commit changes | |
4. Push to the branch | |
5. Open a pull request | |
## π License | |
This project is licensed under the MIT License - see the LICENSE file for details. | |
## π Acknowledgments | |
- HuggingFace Transformers | |
- Librosa Audio Processing | |
- Gradio Interface Library | |
## π Contact | |
For questions and support, please open an issue in the repository. | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |