File size: 2,403 Bytes
162ff97
 
 
 
 
 
 
 
 
 
 
 
cb017a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: Multi Modal Emotion Recognition
emoji: πŸ“ˆ
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
license: mit
---

# Multi Modal Emotion Recognition πŸ“ˆ

This application allows users to analyze emotions from videos using state-of-the-art models for both audio and visual content. You can upload videos (maximum length of 2 minutes) to extract emotions from both speech and facial expressions in real-time.

## Features:
- **Audio Emotion Detection:** Uses OpenAI's Whisper model for transcription and Cardiff NLP's RoBERTa model for emotion recognition in text.
- **Visual Emotion Analysis:** Leverages Salesforce's BLIP model for image captioning and J-Hartmann's DistilRoBERTa for visual emotion recognition.

## Instructions:
1. Upload a video file (maximum length: **2 minutes**).
2. The app will analyze both the audio and visual components of the video to extract and display emotions in real-time.

## Models Used:
The models have been handpicked after numerous trials and are optimized for this task. Below are the models and the corresponding research papers:

1. **Cardiff NLP RoBERTa for Emotion Recognition from Text:**
   - [Model: cardiffnlp/twitter-roberta-base-emotion](https://huggingface.co/cardiffnlp/twitter-roberta-base-emotion)
   - [Paper: RoBERTa Sentiment & Emotion Analysis](https://arxiv.org/pdf/2010.12421)

2. **Salesforce BLIP for Image Captioning and Visual Emotion Analysis:**
   - [Model: Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
   - [Paper: BLIP - Bootstrapping Language-Image Pre-training](https://arxiv.org/abs/2201.12086)

3. **J-Hartmann DistilRoBERTa for Emotion Recognition from Images:**
   - [Model: j-hartmann/emotion-english-distilroberta-base](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base)

4. **OpenAI Whisper for Speech-to-Text Transcription:**
   - [Model: openai/whisper-base](https://huggingface.co/openai/whisper-base)
   - [Paper: Whisper - Speech Recognition](https://arxiv.org/abs/2212.04356)

These models were selected based on extensive trials to ensure the best performance for this multimodal emotion recognition task.

## Access the App:
You can try the app [here](https://huggingface.co/spaces/Pradheep1647/multi-modal-emotion-recognition).

## License:
This project is licensed under the MIT License.