lucas-leme commited on
Commit
7194c24
1 Parent(s): 334a67e

Upload 4 files

Browse files
README.md CHANGED
@@ -13,20 +13,44 @@ widget:
13
 
14
  # FinBertPTBR : Financial Bert PT BR
15
 
16
- FinBertPTBR is a pre-trained NLP model to analyze sentiment of Brazilian Portuguese financial texts. It is built by further training the BERTimbau language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Usage
19
  ```python
20
- from transformers import AutoTokenizer, AutoModel
 
21
 
22
- tokenizer = AutoTokenizer.from_pretrained("turing-usp/FinBertPTBR")
23
- model = AutoModel.from_pretrained("turing-usp/FinBertPTBR")
 
 
 
 
 
 
 
24
  ```
25
 
26
- ## Authors
27
 
28
- - [Vinicius Carmo](https://www.linkedin.com/in/vinicius-cleves/)
29
- - [Julia Pocciotti](https://www.linkedin.com/in/juliapocciotti/)
30
- - [Luísa Heise](https://www.linkedin.com/in/lu%C3%ADsa-mendes-heise/)
31
  - [Lucas Leme](https://www.linkedin.com/in/lucas-leme-santos/)
32
-
 
 
 
13
 
14
  # FinBertPTBR : Financial Bert PT BR
15
 
16
+ FinBertPTBR is a pre-trained NLP model to analyze sentiment of Brazilian Portuguese financial texts.
17
+
18
+ The model was trained in two main stages: language modeling and sentiment modeling. In the first stage, a language model was trained with more than 1.4 million texts of financial news in Portuguese.
19
+ From this first training, it was possible to build a sentiment classifier with few labeled texts (500) that presented a satisfactory convergence.
20
+
21
+ At the end of the work, a comparative analysis with other models and the possible applications of the developed model are presented.
22
+ In the comparative analysis, it was possible to observe that the developed model presented better results than the current models in the state of the art.
23
+ Among the applications, it was demonstrated that the model can be used to build sentiment indices, investment strategies and macroeconomic data analysis, such as inflation.
24
+
25
+ ## Applications
26
+
27
+ ### Sentiment Index
28
+
29
+ ![Sentiment Index](sentiment_index_and_economy.png)
30
+
31
+ ### Inflation Analysis
32
+
33
+ ![Inflation Analysis](sentiment_inflation.png)
34
 
35
  ## Usage
36
  ```python
37
+ from transformers import AutoTokenizer, BertForSequenceClassification
38
+ import numpy as np
39
 
40
+ pred_mapper = {0: 'negative', 1: 'positive', 2: 'neutral'}
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained("lucas-leme/FinBERT-PT-BR")
43
+ finbertptbr = BertForSequenceClassification.from_pretrained("lucas-leme/FinBERT-PT-BR")
44
+
45
+ tokens = tokenizer(["Hoje a bolsa caiu", "Hoje a bolsa subiu"], return_tensors="pt",
46
+ padding=True, truncation=True, max_length=512)
47
+ finbertptbr_outputs = finbertptbr(**tokens)
48
+ preds = [pred_mapper[np.argmax(pred)] for pred in finbertptbr_outputs.logits.cpu().detach().numpy()]
49
  ```
50
 
51
+ ## Author
52
 
 
 
 
53
  - [Lucas Leme](https://www.linkedin.com/in/lucas-leme-santos/)
54
+
55
+ ## Paper - Stay tuned!
56
+
config.json CHANGED
@@ -10,9 +10,9 @@
10
  "hidden_dropout_prob": 0.1,
11
  "hidden_size": 768,
12
  "id2label": {
13
- "0": "LABEL_0",
14
- "1": "LABEL_1",
15
- "2": "LABEL_2"
16
  },
17
  "initializer_range": 0.02,
18
  "intermediate_size": 3072,
@@ -40,4 +40,4 @@
40
  "type_vocab_size": 2,
41
  "use_cache": true,
42
  "vocab_size": 29794
43
- }
 
10
  "hidden_dropout_prob": 0.1,
11
  "hidden_size": 768,
12
  "id2label": {
13
+ "0": "POSITIVE",
14
+ "1": "NEGATIVE",
15
+ "2": "NEUTRAL"
16
  },
17
  "initializer_range": 0.02,
18
  "intermediate_size": 3072,
 
40
  "type_vocab_size": 2,
41
  "use_cache": true,
42
  "vocab_size": 29794
43
+ }
sentiment_index_and_economy.png ADDED
sentiment_inflation.png ADDED