File size: 3,149 Bytes
754f2a8
334a67e
754f2a8
334a67e
 
 
 
 
 
 
 
754f2a8
334a67e
4d81309
334a67e
4d81309
7194c24
 
 
 
 
 
 
 
 
 
 
 
 
 
334a67e
 
4d81309
7cd9087
4d81309
334a67e
7194c24
 
334a67e
4d81309
 
 
 
 
 
a397e01
 
7194c24
 
 
 
4d81309
7194c24
334a67e
7cd9087
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7194c24
334a67e
4865d7c
7194c24
e26e1d9
0d0a13c
b9ba62c
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
language: pt
license: apache-2.0

widget:
- text: "O futuro de DI caiu 20 bps nesta manhã"
  example_title: "Example 1"
- text: "O Nubank decidiu cortar a faixa de preço da oferta pública inicial (IPO) após revés no humor dos mercados internacionais com as fintechs."
  example_title: "Example 2"
- text: "O Ibovespa acompanha correção do mercado e fecha com alta moderada"
  example_title: "Example 3"
---

# FinBERT-PT-BR : Financial BERT PT BR

FinBERT-PT-BR is a pre-trained NLP model to analyze sentiment of Brazilian Portuguese financial texts.

The model was trained in two main stages: language modeling and sentiment modeling. In the first stage, a language model was trained with more than 1.4 million texts of financial news in Portuguese. 
From this first training, it was possible to build a sentiment classifier with few labeled texts (500) that presented a satisfactory convergence.

At the end of the work, a comparative analysis with other models and the possible applications of the developed model are presented. 
In the comparative analysis, it was possible to observe that the developed model presented better results than the current models in the state of the art. 
Among the applications, it was demonstrated that the model can be used to build sentiment indices, investment strategies and macroeconomic data analysis, such as inflation.

## Applications

### Sentiment Index

![Sentiment Index](sentiment_index_and_economy.png)


## Usage

#### BertForSequenceClassification

```python
from transformers import AutoTokenizer, BertForSequenceClassification
import numpy as np
  
pred_mapper = {
    0: "POSITIVE",
    1: "NEGATIVE",
    2: "NEUTRAL"
  }

tokenizer = AutoTokenizer.from_pretrained("lucas-leme/FinBERT-PT-BR")
finbertptbr = BertForSequenceClassification.from_pretrained("lucas-leme/FinBERT-PT-BR")

tokens = tokenizer(["Hoje a bolsa caiu", "Hoje a bolsa subiu"], return_tensors="pt",
                    padding=True, truncation=True, max_length=512)
finbertptbr_outputs = finbertptbr(**tokens)

preds = [pred_mapper[np.argmax(pred)] for pred in finbertptbr_outputs.logits.cpu().detach().numpy()]
```

#### Pipeline

```python
from transformers import (
    AutoTokenizer, 
    BertForSequenceClassification,
    pipeline,
)

finbert_pt_br_tokenizer = AutoTokenizer.from_pretrained("lucas-leme/FinBERT-PT-BR")
finbert_pt_br_model = BertForSequenceClassification.from_pretrained("lucas-leme/FinBERT-PT-BR")

finbert_pt_br_pipeline = pipeline(task='text-classification', model=finbert_pt_br_model, tokenizer=finbert_pt_br_tokenizer)
finbert_pt_br_pipeline(['Hoje a bolsa caiu', 'Hoje a bolsa subiu'])
```

## Author

  - [Lucas Leme](https://www.linkedin.com/in/lucas-leme-santos/) - [email protected]

## Paper

- Paper: [FinBERT-PT-BR: Sentiment Analysis of Texts in Portuguese from the Financial Market](https://sol.sbc.org.br/index.php/bwaif/article/view/24960)
- Undergraduate thesis: [FinBERT-PT-BR: Análise de sentimentos de textos em português referentes ao mercado financeiro](https://pcs.usp.br/pcspf/wp-content/uploads/sites/8/2022/12/Monografia_PCS3860_COOP_2022_Grupo_C12.pdf)