ans123 commited on
Commit
3e8ea16
·
verified ·
1 Parent(s): 17fd678

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +200 -12
README.md CHANGED
@@ -1,12 +1,200 @@
1
- ---
2
- title: PSYCHOMETER 2.0
3
- emoji: 💻
4
- colorFrom: pink
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 5.28.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Enhanced Facial Analysis Application
2
+
3
+ A comprehensive application for real-time analysis of facial expressions and audio in videos and webcam feeds. This application leverages AI models to interpret users' emotional and cognitive states, providing detailed insights for advertising and user experience research.
4
+
5
+ ## Features
6
+
7
+ ### 1. Live Metrics Visualization
8
+ - **Real-time processing**: View facial metrics and analysis as they change in both video and webcam inputs
9
+ - **Dynamic dashboard**: Visualize multiple metrics simultaneously with intuitive gauges
10
+ - **Progress tracking**: Monitor video processing with detailed status updates
11
+
12
+ ### 2. Gemini 2.0 Flash Integration
13
+ - **Advanced contextual analysis**: Uses Google's Gemini 2.0 Flash to analyze facial expressions in context
14
+ - **Detailed user state reports**: Provides both summary labels and in-depth explanations of detected emotional states
15
+ - **Ad context awareness**: Interprets reactions based on the advertisement type and content
16
+
17
+ ### 3. Audio Analysis Integration
18
+ - **Multimodal sensing**: Combines facial expression and voice tone analysis for more accurate emotional detection
19
+ - **Emotion classification**: Identifies emotions like happiness, sadness, anger, and fear from audio
20
+ - **Confidence metrics**: Provides reliability scores for detected audio emotions
21
+
22
+ ### 4. Comprehensive Metrics
23
+ - **Psychological dimensions**: Measures valence, arousal, dominance, cognitive load, and more
24
+ - **Personality indicators**: Estimates traits like openness, agreeableness, neuroticism, etc.
25
+ - **Engagement tracking**: Monitors attention and involvement levels throughout the content
26
+
27
+ ## Installation
28
+
29
+ ### Prerequisites
30
+ - Python 3.8 or higher
31
+ - Pip package manager
32
+ - FFmpeg (for audio processing)
33
+
34
+ ### Step 1: Clone or download the repository
35
+ ```bash
36
+ git clone https://github.com/yourusername/enhanced-facial-analysis.git
37
+ cd enhanced-facial-analysis
38
+ ```
39
+
40
+ ### Step 2: Set up a virtual environment (recommended)
41
+ ```bash
42
+ python -m venv venv
43
+ source venv/bin/activate # On Windows, use: venv\Scripts\activate
44
+ ```
45
+
46
+ ### Step 3: Install dependencies
47
+ ```bash
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ ### Step 4: Set up API keys
52
+ Create a `.env` file in the project root and add your Gemini API key:
53
+ ```
54
+ GEMINI_API_KEY=your_api_key_here
55
+ ```
56
+
57
+ To obtain a Gemini API key:
58
+ 1. Visit [Google AI Studio](https://ai.google.dev/)
59
+ 2. Sign up or log in
60
+ 3. Navigate to the API keys section
61
+ 4. Create a new API key
62
+
63
+ ## Usage
64
+
65
+ ### Starting the Application
66
+ ```bash
67
+ python app.py
68
+ ```
69
+ This will launch a Gradio web interface accessible at http://localhost:7860
70
+
71
+ ### Video File Analysis
72
+ 1. Navigate to the "Video File API" tab
73
+ 2. Upload a video file (MP4, AVI, MOV, etc.)
74
+ 3. Enter optional ad context information:
75
+ - Ad Description: Brief description of the advertisement
76
+ - Ad Detail Focus: Specific aspect of the ad being analyzed
77
+ - Ad Type/Genre: Category (Video, Funny, Serious, etc.)
78
+ 4. Adjust sampling rate (lower values = more detailed analysis but slower processing)
79
+ 5. Click "Process Video"
80
+ 6. View results in the dashboard and download the CSV data for further analysis
81
+
82
+ ### Webcam Analysis
83
+ 1. Navigate to the "Webcam API" tab
84
+ 2. Ensure your webcam is connected and permissions are granted
85
+ 3. Enter optional ad context information
86
+ 4. Toggle "Record Audio" on if you want to include audio analysis
87
+ 5. Click "Start Session"
88
+ 6. The application will display:
89
+ - Processed webcam feed with facial landmarks
90
+ - Live metrics visualization
91
+ - Real-time user state analysis
92
+ 7. Click "End Session" to stop recording and save the data
93
+
94
+ ## Technical Details
95
+
96
+ ### Architecture
97
+ - **Frontend**: Gradio web interface
98
+ - **Video Processing**: OpenCV and MediaPipe Face Mesh
99
+ - **Audio Processing**: Librosa and Transformers
100
+ - **AI Models**:
101
+ - MediaPipe for facial landmark detection
102
+ - Custom metrics calculation algorithms
103
+ - Hugging Face audio emotion detection model
104
+ - Google Gemini 2.0 Flash for contextual interpretation
105
+
106
+ ### Data Flow
107
+ 1. Input source (video file/webcam) → Frame extraction
108
+ 2. Each frame → MediaPipe → Facial landmarks
109
+ 3. Landmarks → Custom algorithms → Facial metrics
110
+ 4. (Optional) Audio → Hugging Face model → Audio emotion metrics
111
+ 5. All metrics + Context → Gemini API → Detailed user state analysis
112
+ 6. Results → Visualization and CSV export
113
+
114
+ ### Metrics Calculation
115
+ The application extracts several facial features:
116
+ - Eye Aspect Ratio (EAR): Measures eye openness
117
+ - Mouth Aspect Ratio (MAR): Measures mouth openness
118
+ - Eyebrow Position: Detects raised or lowered eyebrows
119
+ - Head Pose: Estimates vertical and horizontal head tilt
120
+
121
+ These features are used to calculate higher-level metrics:
122
+ - **Valence**: Emotional positivity/negativity (0-1)
123
+ - **Arousal**: Level of excitement/activation (0-1)
124
+ - **Dominance**: Feeling of control (0-1)
125
+ - **Cognitive Load**: Mental effort (0-1)
126
+ - **Emotional Stability**: Resilience vs. volatility (0-1)
127
+ - **Stress Index**: Level of tension (0-1)
128
+ - **Engagement**: Attention and involvement (0-1)
129
+
130
+ ## Output Files
131
+
132
+ ### CSV Data
133
+ The application generates CSV files with the following columns:
134
+ - `timestamp`: Time in seconds from start
135
+ - `frame_number`: Sequential frame identifier
136
+ - Facial metrics (valence, arousal, etc.)
137
+ - Audio metrics (audio_valence, audio_emotion, etc.)
138
+ - Ad context information
139
+ - `user_state`: Short summary of detected state
140
+ - `detailed_user_analysis`: In-depth interpretation from Gemini
141
+
142
+ ### Processed Video
143
+ If enabled, the application saves an annotated video showing:
144
+ - Facial landmarks
145
+ - Current emotional state
146
+ - Detected audio emotion (if available)
147
+
148
+ ## Customization
149
+
150
+ ### Audio Model Selection
151
+ You can modify the audio classification model by changing this line in the code:
152
+ ```python
153
+ model_name = "ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
154
+ ```
155
+ Replace with any compatible Hugging Face audio classification model.
156
+
157
+ ### Sampling Rate
158
+ Adjust the sampling rate to balance between processing speed and analysis detail:
159
+ - Lower values (1-3): Process more frames, more detailed but slower
160
+ - Higher values (10+): Process fewer frames, faster but less detailed
161
+
162
+ ### Metrics Visualization
163
+ The application uses a gauge-style visualization for metrics. You can customize the appearance by modifying the `update_metrics_visualization` function.
164
+
165
+ ## Troubleshooting
166
+
167
+ ### Common Issues
168
+
169
+ **No face detected**
170
+ - Ensure adequate lighting
171
+ - Position face within camera frame
172
+ - Check if MediaPipe Face Mesh is properly initialized
173
+
174
+ **Audio not being analyzed**
175
+ - Verify microphone permissions
176
+ - Check if audio recording is enabled
177
+ - Ensure FFmpeg is installed (required for audio extraction)
178
+
179
+ **Slow processing**
180
+ - Increase sampling rate
181
+ - Use a lower resolution video
182
+ - Check system resources (CPU/GPU usage)
183
+
184
+ **Gemini API errors**
185
+ - Verify API key is correct and active
186
+ - Check internet connection
187
+ - Ensure you haven't exceeded API rate limits
188
+
189
+ ## Dependencies
190
+
191
+ The application relies on several key libraries:
192
+ - Gradio: Web interface
193
+ - OpenCV: Video processing
194
+ - MediaPipe: Facial landmark detection
195
+ - Librosa: Audio processing
196
+ - Transformers: Hugging Face models
197
+ - Google Generative AI: Gemini API access
198
+ - MoviePy: Audio extraction from video
199
+ - Matplotlib: Data visualization
200
+ - Pandas: Data manipu