|
--- |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
tags: |
|
- finance |
|
- financial-sentiment-analysis |
|
- sentiment-analysis |
|
library_name: transformers |
|
widget: |
|
- text: unemployment hits record low as job opportunities soar |
|
- text: unemployment hits record high as job opportunities suffers |
|
--- |
|
|
|
`Sentiment-xDistil` is a model based on |
|
[`xtremedistil-l12-h384-uncased`](https://huggingface.co/microsoft/xtremedistil-l12-h384-uncased) |
|
fine-tuned for classifying the sentiment of news headlines on a dataset annotated by |
|
[Chat GPT 3.5](https://platform.openai.com/docs/models/gpt-3-5). It is built, together with |
|
[`Topic-xDistil`](https://huggingface.co/hakonmh/topic-xdistil-uncased), |
|
as a tool for filtering out financial news headlines and classifying their sentiment. |
|
The code used to train both models and build the dataset are found [here](https://github.com/hakonmh/distilnews). |
|
|
|
*Notes*: The output labels are either `Negative`, `Neutral`, or `Positive`. The model is suitable for English. |
|
|
|
## Performance Results |
|
|
|
Here are the performance metrics for both models on the test set: |
|
|
|
| Model | Test Set Size | Accuracy | F1 Score | |
|
| --- | --- | --- | --- | |
|
| `topic-xdistil-uncased` | 32 799 | 94.44 % | 92.59 % | |
|
| `sentiment-xdistil-uncased` | 17 527 | 94.59 % | 93.44 % | |
|
|
|
## Data |
|
|
|
The training data consists of 300k+ news headlines and tweets, and was annotated by |
|
[Chat GPT 3.5](https://platform.openai.com/docs/models/gpt-3-5), which has shown to |
|
[outperform crowd-workers for text annotation tasks](https://arxiv.org/pdf/2303.15056.pdf). |
|
|
|
The sentence labels are defined by the Chat GPT prompt as follows: |
|
```python |
|
""" |
|
[...] |
|
Does the headline convey a Positive, Neutral, or Negative sentiment with \ |
|
regard to the current state or potential future impact on the economy or \ |
|
the asset described? |
|
- Positive sentiment headlines suggest growth, improvement, or \ |
|
stability in economic conditions. |
|
- Neutral sentiment headlines do not clearly indicate a positive or \ |
|
negative impact on the economy. |
|
- Negative sentiment headlines imply economic decline, uncertainty, \ |
|
or unfavorable conditions. |
|
[...] |
|
""" |
|
``` |
|
|
|
## Example Usage |
|
|
|
Here's a simple example: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("hakonmh/sentiment-xdistil-uncased") |
|
tokenizer = AutoTokenizer.from_pretrained("hakonmh/sentiment-xdistil-uncased") |
|
|
|
SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!" |
|
inputs = tokenizer(SENTENCE, return_tensors="pt") |
|
output = model(**inputs).logits |
|
predicted_label = model.config.id2label[output.argmax(-1).item()] |
|
|
|
print(predicted_label) |
|
``` |
|
|
|
```text |
|
Positive |
|
``` |
|
|
|
Or, as a pipeline together with `Topic-xDistil`: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
topic_classifier = pipeline("sentiment-analysis", |
|
model="hakonmh/topic-xdistil-uncased", |
|
tokenizer="hakonmh/topic-xdistil-uncased") |
|
sentiment_classifier = pipeline("sentiment-analysis", |
|
model="hakonmh/sentiment-xdistil-uncased", |
|
tokenizer="hakonmh/sentiment-xdistil-uncased") |
|
|
|
SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!" |
|
print(topic_classifier(SENTENCE)) |
|
print(sentiment_classifier(SENTENCE)) |
|
``` |
|
|
|
```text |
|
[{'label': 'Economics', 'score': 0.9970171451568604}] |
|
[{'label': 'Positive', 'score': 0.9997037053108215}] |
|
``` |
|
|
|
Tested on `transformers` 4.30.1, and `torch` 2.0.0. |
|
|