|
--- |
|
language: |
|
- en |
|
tags: |
|
- text-classification |
|
- emotion |
|
- pytorch |
|
license: mit |
|
datasets: |
|
- emotion |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
- f1 |
|
--- |
|
|
|
# DistilBERT-Base-Uncased-Emotion |
|
|
|
## Model Description |
|
|
|
`distilbert-base-uncased-emotion` is a specialized model finetuned on a combination of unify-emotion-datasets (https://github.com/sarnthil/unify-emotion-datasets), containing around 250K texts labeled across seven emotion categories: neutral, happy, sad, anger, disgust, surprise, and fear. This model was later adapted to a smaller set of 10K hand-tagged messages from StockTwits. The model is designed to excel at emotion detection in financial social media content such as that found on StockTwits. |
|
|
|
Model parameters were as follows: sequence length of 64, learning rate of 2e-5, batch size of 128, trained for 8 epochs. For steps on how to use the model for inference, please refer to the accompanying Inference.ipynb notebook. |
|
|
|
## Training Data |
|
|
|
The training data was obtained from the Unify Emotion Datasets available at https://github.com/sarnthil/unify-emotion-datasets. |
|
|
|
## Evaluation Metrics |
|
|
|
The model was evaluated using the following metrics: |
|
- Accuracy |
|
- Precision |
|
- Recall |
|
- F1-score |
|
|
|
## Research |
|
|
|
The underlying research for emotion extraction from social media can be found in the paper "EmTract: Extracting Emotions from Social Media". |
|
The paper is available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3975884. |
|
|
|
### Research using EmTract |
|
[Social Media Emotions and IPO Returns](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4384573) |
|
[Investor Emotions and Earnings Announcements](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3626025]) |
|
|
|
## License |
|
|
|
This project is licensed under the terms of the MIT license. |
|
|
|
|
|
|