avichr commited on
Commit
9c5f52b
1 Parent(s): ead3ce1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HebEMO - Emotion Recognition Model for Modern Hebrew
2
+ <img align="right" src="https://github.com/avichaychriqui/HeBERT/blob/main/data/heBERT_logo.png?raw=true" width="250">
3
+
4
+ HebEMO is a tool that detects polarity and extracts emotions from modern Hebrew User-Generated Content (UGC), which was trained on a unique Covid-19 related dataset that we collected and annotated.
5
+
6
+ HebEMO yielded a high performance of weighted average F1-score = 0.96 for polarity classification.
7
+ Emotion detection reached an F1-score of 0.78-0.97, with the exception of *surprise*, which the model failed to capture (F1 = 0.41). These results are better than the best-reported performance, even when compared to the English language.
8
+
9
+ ## Emotion UGC Data Description
10
+ Our UGC data includes comments posted on news articles collected from 3 major Israeli news sites, between January 2020 to August 2020. The total size of the data is ~150 MB, including over 7 million words and 350K sentences.
11
+ ~2000 sentences were annotated by crowd members (3-10 annotators per sentence) for overall sentiment (polarity) and [eight emotions](https://en.wikipedia.org/wiki/Robert_Plutchik#Plutchik's_wheel_of_emotions): anger, disgust, anticipation , fear, joy, sadness, surprise and trust.
12
+ The percentage of sentences in which each emotion appeared is found in the table below.
13
+
14
+ | | anger | disgust | expectation | fear | happy | sadness | surprise | trust | sentiment |
15
+ |------:|------:|--------:|------------:|-----:|------:|--------:|---------:|------:|-----------|
16
+ | **ratio** | 0.78 | 0.83 | 0.58 | 0.45 | 0.12 | 0.59 | 0.17 | 0.11 | 0.25 |
17
+
18
+
19
+
20
+ ## Performance
21
+ ### Emotion Recognition
22
+ | emotion | f1-score | precision | recall |
23
+ |-------------|----------|-----------|----------|
24
+ | anger | 0.96 | 0.99 | 0.93 |
25
+ | disgust | 0.97 | 0.98 | 0.96 |
26
+ |anticipation | 0.82 | 0.80 | 0.87 |
27
+ | fear | 0.79 | 0.88 | 0.72 |
28
+ | joy | 0.90 | 0.97 | 0.84 |
29
+ | sadness | 0.90 | 0.86 | 0.94 |
30
+ | surprise | 0.40 | 0.44 | 0.37 |
31
+ | trust | 0.83 | 0.86 | 0.80 |
32
+
33
+ *The above metrics is for positive class (meaning, the emotion is reflected in the text).*
34
+
35
+ ### Sentiment (Polarity) Analysis
36
+ | | precision | recall | f1-score |
37
+ |--------------|-----------|--------|----------|
38
+ | neutral | 0.83 | 0.56 | 0.67 |
39
+ | positive | 0.96 | 0.92 | 0.94 |
40
+ | negative | 0.97 | 0.99 | 0.98 |
41
+ | accuracy | | | 0.97 |
42
+ | macro avg | 0.92 | 0.82 | 0.86 |
43
+ | weighted avg | 0.96 | 0.97 | 0.96 |
44
+
45
+ *Sentiment (polarity) analysis model is also available on AWS! for more information visit [AWS' git](https://github.com/aws-samples/aws-lambda-docker-serverless-inference/tree/main/hebert-sentiment-analysis-inference-docker-lambda)*
46
+
47
+ ## How to use
48
+
49
+ ### Emotion Recognition Model
50
+ An online model can be found at [huggingface spaces](https://huggingface.co/spaces/avichr/HebEMO_demo) or as [colab notebook](https://colab.research.google.com/drive/1Jw3gOWjwVMcZslu-ttXoNeD17lms1-ff?usp=sharing)
51
+ ```
52
+ # !pip install pyplutchik==0.0.7
53
+ # !pip install transformers==4.14.1
54
+
55
+ !git clone https://github.com/avichaychriqui/HeBERT.git
56
+ from HeBERT.src.HebEMO import *
57
+ HebEMO_model = HebEMO()
58
+
59
+ HebEMO_model.hebemo(input_path = 'data/text_example.txt')
60
+ # return analyzed pandas.DataFrame
61
+
62
+ hebEMO_df = HebEMO_model.hebemo(text='讛讞讬讬诐 讬驻讬诐 讜诪讗讜砖专讬诐', plot=True)
63
+ ```
64
+ <img src="https://github.com/avichaychriqui/HeBERT/blob/main/data/hebEMO1.png?raw=true" width="300" height="300" />
65
+
66
+
67
+
68
+ ### For sentiment classification model (polarity ONLY):
69
+ from transformers import AutoTokenizer, AutoModel, pipeline
70
+
71
+ tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
72
+ model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
73
+
74
+ # how to use?
75
+ sentiment_analysis = pipeline(
76
+ "sentiment-analysis",
77
+ model="avichr/heBERT_sentiment_analysis",
78
+ tokenizer="avichr/heBERT_sentiment_analysis",
79
+ return_all_scores = True
80
+ )
81
+
82
+ sentiment_analysis('讗谞讬 诪转诇讘讟 诪讛 诇讗讻讜诇 诇讗专讜讞转 爪讛专讬讬诐')
83
+ >>> [[{'label': 'neutral', 'score': 0.9978172183036804},
84
+ >>> {'label': 'positive', 'score': 0.0014792329166084528},
85
+ >>> {'label': 'negative', 'score': 0.0007035882445052266}]]
86
+
87
+ sentiment_analysis('拽驻讛 讝讛 讟注讬诐')
88
+ >>> [[{'label': 'neutral', 'score': 0.00047328314394690096},
89
+ >>> {'label': 'possitive', 'score': 0.9994067549705505},
90
+ >>> {'label': 'negetive', 'score': 0.00011996887042187154}]]
91
+
92
+ sentiment_analysis('讗谞讬 诇讗 讗讜讛讘 讗转 讛注讜诇诐')
93
+ >>> [[{'label': 'neutral', 'score': 9.214012970915064e-05},
94
+ >>> {'label': 'possitive', 'score': 8.876807987689972e-05},
95
+ >>> {'label': 'negetive', 'score': 0.9998190999031067}]]
96
+
97
+
98
+
99
+ The tool has been developed by The Coller Semitic Languages AI Lab in Coller School of Management at Tel Aviv University
100
+
101
+