Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,200 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Enhanced Facial Analysis Application
|
2 |
+
|
3 |
+
A comprehensive application for real-time analysis of facial expressions and audio in videos and webcam feeds. This application leverages AI models to interpret users' emotional and cognitive states, providing detailed insights for advertising and user experience research.
|
4 |
+
|
5 |
+
## Features
|
6 |
+
|
7 |
+
### 1. Live Metrics Visualization
|
8 |
+
- **Real-time processing**: View facial metrics and analysis as they change in both video and webcam inputs
|
9 |
+
- **Dynamic dashboard**: Visualize multiple metrics simultaneously with intuitive gauges
|
10 |
+
- **Progress tracking**: Monitor video processing with detailed status updates
|
11 |
+
|
12 |
+
### 2. Gemini 2.0 Flash Integration
|
13 |
+
- **Advanced contextual analysis**: Uses Google's Gemini 2.0 Flash to analyze facial expressions in context
|
14 |
+
- **Detailed user state reports**: Provides both summary labels and in-depth explanations of detected emotional states
|
15 |
+
- **Ad context awareness**: Interprets reactions based on the advertisement type and content
|
16 |
+
|
17 |
+
### 3. Audio Analysis Integration
|
18 |
+
- **Multimodal sensing**: Combines facial expression and voice tone analysis for more accurate emotional detection
|
19 |
+
- **Emotion classification**: Identifies emotions like happiness, sadness, anger, and fear from audio
|
20 |
+
- **Confidence metrics**: Provides reliability scores for detected audio emotions
|
21 |
+
|
22 |
+
### 4. Comprehensive Metrics
|
23 |
+
- **Psychological dimensions**: Measures valence, arousal, dominance, cognitive load, and more
|
24 |
+
- **Personality indicators**: Estimates traits like openness, agreeableness, neuroticism, etc.
|
25 |
+
- **Engagement tracking**: Monitors attention and involvement levels throughout the content
|
26 |
+
|
27 |
+
## Installation
|
28 |
+
|
29 |
+
### Prerequisites
|
30 |
+
- Python 3.8 or higher
|
31 |
+
- Pip package manager
|
32 |
+
- FFmpeg (for audio processing)
|
33 |
+
|
34 |
+
### Step 1: Clone or download the repository
|
35 |
+
```bash
|
36 |
+
git clone https://github.com/yourusername/enhanced-facial-analysis.git
|
37 |
+
cd enhanced-facial-analysis
|
38 |
+
```
|
39 |
+
|
40 |
+
### Step 2: Set up a virtual environment (recommended)
|
41 |
+
```bash
|
42 |
+
python -m venv venv
|
43 |
+
source venv/bin/activate # On Windows, use: venv\Scripts\activate
|
44 |
+
```
|
45 |
+
|
46 |
+
### Step 3: Install dependencies
|
47 |
+
```bash
|
48 |
+
pip install -r requirements.txt
|
49 |
+
```
|
50 |
+
|
51 |
+
### Step 4: Set up API keys
|
52 |
+
Create a `.env` file in the project root and add your Gemini API key:
|
53 |
+
```
|
54 |
+
GEMINI_API_KEY=your_api_key_here
|
55 |
+
```
|
56 |
+
|
57 |
+
To obtain a Gemini API key:
|
58 |
+
1. Visit [Google AI Studio](https://ai.google.dev/)
|
59 |
+
2. Sign up or log in
|
60 |
+
3. Navigate to the API keys section
|
61 |
+
4. Create a new API key
|
62 |
+
|
63 |
+
## Usage
|
64 |
+
|
65 |
+
### Starting the Application
|
66 |
+
```bash
|
67 |
+
python app.py
|
68 |
+
```
|
69 |
+
This will launch a Gradio web interface accessible at http://localhost:7860
|
70 |
+
|
71 |
+
### Video File Analysis
|
72 |
+
1. Navigate to the "Video File API" tab
|
73 |
+
2. Upload a video file (MP4, AVI, MOV, etc.)
|
74 |
+
3. Enter optional ad context information:
|
75 |
+
- Ad Description: Brief description of the advertisement
|
76 |
+
- Ad Detail Focus: Specific aspect of the ad being analyzed
|
77 |
+
- Ad Type/Genre: Category (Video, Funny, Serious, etc.)
|
78 |
+
4. Adjust sampling rate (lower values = more detailed analysis but slower processing)
|
79 |
+
5. Click "Process Video"
|
80 |
+
6. View results in the dashboard and download the CSV data for further analysis
|
81 |
+
|
82 |
+
### Webcam Analysis
|
83 |
+
1. Navigate to the "Webcam API" tab
|
84 |
+
2. Ensure your webcam is connected and permissions are granted
|
85 |
+
3. Enter optional ad context information
|
86 |
+
4. Toggle "Record Audio" on if you want to include audio analysis
|
87 |
+
5. Click "Start Session"
|
88 |
+
6. The application will display:
|
89 |
+
- Processed webcam feed with facial landmarks
|
90 |
+
- Live metrics visualization
|
91 |
+
- Real-time user state analysis
|
92 |
+
7. Click "End Session" to stop recording and save the data
|
93 |
+
|
94 |
+
## Technical Details
|
95 |
+
|
96 |
+
### Architecture
|
97 |
+
- **Frontend**: Gradio web interface
|
98 |
+
- **Video Processing**: OpenCV and MediaPipe Face Mesh
|
99 |
+
- **Audio Processing**: Librosa and Transformers
|
100 |
+
- **AI Models**:
|
101 |
+
- MediaPipe for facial landmark detection
|
102 |
+
- Custom metrics calculation algorithms
|
103 |
+
- Hugging Face audio emotion detection model
|
104 |
+
- Google Gemini 2.0 Flash for contextual interpretation
|
105 |
+
|
106 |
+
### Data Flow
|
107 |
+
1. Input source (video file/webcam) → Frame extraction
|
108 |
+
2. Each frame → MediaPipe → Facial landmarks
|
109 |
+
3. Landmarks → Custom algorithms → Facial metrics
|
110 |
+
4. (Optional) Audio → Hugging Face model → Audio emotion metrics
|
111 |
+
5. All metrics + Context → Gemini API → Detailed user state analysis
|
112 |
+
6. Results → Visualization and CSV export
|
113 |
+
|
114 |
+
### Metrics Calculation
|
115 |
+
The application extracts several facial features:
|
116 |
+
- Eye Aspect Ratio (EAR): Measures eye openness
|
117 |
+
- Mouth Aspect Ratio (MAR): Measures mouth openness
|
118 |
+
- Eyebrow Position: Detects raised or lowered eyebrows
|
119 |
+
- Head Pose: Estimates vertical and horizontal head tilt
|
120 |
+
|
121 |
+
These features are used to calculate higher-level metrics:
|
122 |
+
- **Valence**: Emotional positivity/negativity (0-1)
|
123 |
+
- **Arousal**: Level of excitement/activation (0-1)
|
124 |
+
- **Dominance**: Feeling of control (0-1)
|
125 |
+
- **Cognitive Load**: Mental effort (0-1)
|
126 |
+
- **Emotional Stability**: Resilience vs. volatility (0-1)
|
127 |
+
- **Stress Index**: Level of tension (0-1)
|
128 |
+
- **Engagement**: Attention and involvement (0-1)
|
129 |
+
|
130 |
+
## Output Files
|
131 |
+
|
132 |
+
### CSV Data
|
133 |
+
The application generates CSV files with the following columns:
|
134 |
+
- `timestamp`: Time in seconds from start
|
135 |
+
- `frame_number`: Sequential frame identifier
|
136 |
+
- Facial metrics (valence, arousal, etc.)
|
137 |
+
- Audio metrics (audio_valence, audio_emotion, etc.)
|
138 |
+
- Ad context information
|
139 |
+
- `user_state`: Short summary of detected state
|
140 |
+
- `detailed_user_analysis`: In-depth interpretation from Gemini
|
141 |
+
|
142 |
+
### Processed Video
|
143 |
+
If enabled, the application saves an annotated video showing:
|
144 |
+
- Facial landmarks
|
145 |
+
- Current emotional state
|
146 |
+
- Detected audio emotion (if available)
|
147 |
+
|
148 |
+
## Customization
|
149 |
+
|
150 |
+
### Audio Model Selection
|
151 |
+
You can modify the audio classification model by changing this line in the code:
|
152 |
+
```python
|
153 |
+
model_name = "ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
|
154 |
+
```
|
155 |
+
Replace with any compatible Hugging Face audio classification model.
|
156 |
+
|
157 |
+
### Sampling Rate
|
158 |
+
Adjust the sampling rate to balance between processing speed and analysis detail:
|
159 |
+
- Lower values (1-3): Process more frames, more detailed but slower
|
160 |
+
- Higher values (10+): Process fewer frames, faster but less detailed
|
161 |
+
|
162 |
+
### Metrics Visualization
|
163 |
+
The application uses a gauge-style visualization for metrics. You can customize the appearance by modifying the `update_metrics_visualization` function.
|
164 |
+
|
165 |
+
## Troubleshooting
|
166 |
+
|
167 |
+
### Common Issues
|
168 |
+
|
169 |
+
**No face detected**
|
170 |
+
- Ensure adequate lighting
|
171 |
+
- Position face within camera frame
|
172 |
+
- Check if MediaPipe Face Mesh is properly initialized
|
173 |
+
|
174 |
+
**Audio not being analyzed**
|
175 |
+
- Verify microphone permissions
|
176 |
+
- Check if audio recording is enabled
|
177 |
+
- Ensure FFmpeg is installed (required for audio extraction)
|
178 |
+
|
179 |
+
**Slow processing**
|
180 |
+
- Increase sampling rate
|
181 |
+
- Use a lower resolution video
|
182 |
+
- Check system resources (CPU/GPU usage)
|
183 |
+
|
184 |
+
**Gemini API errors**
|
185 |
+
- Verify API key is correct and active
|
186 |
+
- Check internet connection
|
187 |
+
- Ensure you haven't exceeded API rate limits
|
188 |
+
|
189 |
+
## Dependencies
|
190 |
+
|
191 |
+
The application relies on several key libraries:
|
192 |
+
- Gradio: Web interface
|
193 |
+
- OpenCV: Video processing
|
194 |
+
- MediaPipe: Facial landmark detection
|
195 |
+
- Librosa: Audio processing
|
196 |
+
- Transformers: Hugging Face models
|
197 |
+
- Google Generative AI: Gemini API access
|
198 |
+
- MoviePy: Audio extraction from video
|
199 |
+
- Matplotlib: Data visualization
|
200 |
+
- Pandas: Data manipu
|