Upload speech emotion recognition model
Browse files
README.md
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# SentimentSound
|
2 |
+
|
3 |
+
## Overview
|
4 |
+
This is a deep learning model for Speech Emotion Recognition that can classify audio clips into different emotional states. The model is trained on a dataset of speech samples and can identify emotions such as neutral, calm, happy, sad, angry, fearful, disgust, and surprised.
|
5 |
+
|
6 |
+
## Model Details
|
7 |
+
- **Model Type:** Hybrid Neural Network (CNN + LSTM)
|
8 |
+
- **Input:** Audio features extracted from 3-second wav files
|
9 |
+
- **Output:** Emotion classification
|
10 |
+
|
11 |
+
### Supported Emotions
|
12 |
+
- Neutral
|
13 |
+
- Calm
|
14 |
+
- Happy
|
15 |
+
- Sad
|
16 |
+
- Angry
|
17 |
+
- Fearful
|
18 |
+
- Disgust
|
19 |
+
- Surprised
|
20 |
+
|
21 |
+
## Installation
|
22 |
+
|
23 |
+
### Clone the Repository
|
24 |
+
```bash
|
25 |
+
git clone https://github.com/Vishal-Padia/SentimentSound.git
|
26 |
+
```
|
27 |
+
|
28 |
+
### Dependencies
|
29 |
+
```bash
|
30 |
+
pip install -r requirements.txt
|
31 |
+
```
|
32 |
+
|
33 |
+
### Usage Example
|
34 |
+
|
35 |
+
```bash
|
36 |
+
python emotion_predictor.py
|
37 |
+
```
|
38 |
+
|
39 |
+
|
40 |
+
## Model Performance
|
41 |
+
- **Accuracy:** 85%
|
42 |
+
- **Evaluation Metrics:** Confusion matrix below
|
43 |
+
|
44 |
+
![Image](confusion_matrix.png)
|
45 |
+
|
46 |
+
## Training Details
|
47 |
+
- **Feature Extraction:**
|
48 |
+
- MFCC
|
49 |
+
- Spectral Centroid
|
50 |
+
- Chroma Features
|
51 |
+
- Spectral Contrast
|
52 |
+
- Zero Crossing Rate
|
53 |
+
- Spectral Rolloff
|
54 |
+
- **Augmentation:** Random noise and scaling applied
|
55 |
+
- **Training Techniques:**
|
56 |
+
- Class weighted loss
|
57 |
+
- AdamW optimizer
|
58 |
+
- Learning rate scheduling
|
59 |
+
- Gradient clipping
|
60 |
+
|
61 |
+
## Limitations
|
62 |
+
- Works best with clear speech recordings
|
63 |
+
- Optimized for 3-second audio clips
|
64 |
+
- Performance may vary with different audio sources
|
65 |
+
|
66 |
+
## Acknowledgments
|
67 |
+
- Dataset used for training (https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio)
|