File size: 1,760 Bytes
7944dc0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f96b15
7944dc0
a92d680
7944dc0
47f6e0e
7944dc0
a92d680
7944dc0
a92d680
 
47f6e0e
a92d680
 
 
 
 
 
 
 
 
 
 
47f6e0e
a92d680
 
47f6e0e
a92d680
47f6e0e
a92d680
 
 
 
 
7944dc0
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
language: 
- en
tags:
- text-classification
- emotion
- pytorch
license: mit
datasets:
- emotion
metrics:
- accuracy
- precision
- recall
- f1
---

# EmTract (DistilBERT-Base-Uncased)

## Model Description

`emtract-distilbert-base-uncased-emotion` is a specialized model finetuned on a combination of [unify-emotion-datasets](https://github.com/sarnthil/unify-emotion-datasets), containing around 250K texts labeled across seven emotion categories: neutral, happy, sad, anger, disgust, surprise, and fear. This model was later adapted to a smaller set of 10K hand-tagged messages from StockTwits. The model is designed to excel at emotion detection in financial social media content such as that found on StockTwits. 

Model parameters were as follows: sequence length of 64, learning rate of 2e-5, batch size of 128, trained for 8 epochs. For steps on how to use the model for inference, please refer to the accompanying Inference.ipynb notebook.

## Training Data

The training data was obtained from the Unify Emotion Datasets available at [here](https://github.com/sarnthil/unify-emotion-datasets).

## Evaluation Metrics

The model was evaluated using the following metrics:
- Accuracy
- Precision
- Recall
- F1-score

## Research

The underlying research for emotion extraction from social media can be found in the paper [EmTract: Extracting Emotions from Social Media](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3975884).

### Research using EmTract

[Social Media Emotions and IPO Returns](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4384573)

[Investor Emotions and Earnings Announcements](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3626025])

## License

This project is licensed under the terms of the MIT license.