--- title: Vocal Emotion Recognition emoji: 🎤 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.6.0 app_file: app.py pinned: false --- # Vocal Emotion Recognition System ## 🎯 Project Overview A deep learning-based system for real-time emotion recognition from vocal input using state-of-the-art audio processing and transformer models. ### Key Features - Real-time vocal emotion analysis - Advanced audio feature extraction - Pre-trained transformer model integration - User-friendly web interface - Comprehensive evaluation metrics ## 🛠️ Technical Architecture ### Components 1. **Audio Processing Pipeline** - Sample rate standardization (16kHz) - Noise reduction and normalization - Feature extraction (MFCC, Chroma, Mel spectrograms) 2. **Machine Learning Pipeline** - DistilBERT-based emotion classification - Transfer learning capabilities - Comprehensive evaluation metrics 3. **Web Interface** - Gradio-based interactive UI - Real-time processing - Intuitive result visualization ## 📦 Installation 1. **Clone the Repository** ```bash git clone [repository-url] cd vocal-emotion-recognition ``` 2. **Install Dependencies** ```bash pip install -r requirements.txt ``` 3. **Environment Setup** - Python 3.8+ required - CUDA-compatible GPU recommended for training - Microphone access required for real-time analysis ## 🚀 Usage ### Starting the Application ```bash python app.py ``` - Access the web interface at `http://localhost:7860` - Use microphone input for real-time analysis - View emotion classification results instantly ### Training Custom Models ```bash python model_training.py --data_path [path] --epochs [num] ``` ## 📊 Model Performance The system utilizes various metrics for evaluation: - Accuracy, Precision, Recall, F1 Score - ROC-AUC Score - Confusion Matrix - MAE and RMSE ## 🔧 Configuration ### Model Settings - Base model: `bhadresh-savani/distilbert-base-uncased-emotion` - Audio sample rate: 16kHz - Batch size: 8 (configurable) - Learning rate: 5e-5 ### Feature Extraction - MFCC: 13 coefficients - Chroma features - Mel spectrograms - Spectral contrast - Tonnetz features ## 📝 API Reference ### Audio Processing ```python preprocess_audio(audio_file) extract_features(audio_data) ``` ### Model Interface ```python analyze_emotion(audio_input) train_model(data_path, epochs) ``` ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch 3. Commit changes 4. Push to the branch 5. Open a pull request ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. ## 🙏 Acknowledgments - HuggingFace Transformers - Librosa Audio Processing - Gradio Interface Library ## 📞 Contact For questions and support, please open an issue in the repository. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference