Commit
·
d36cdbb
1
Parent(s):
b82d8da
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# RePublic
|
2 |
+
|
3 |
+
### Model description
|
4 |
+
RePublic (reputation analyzer for public agencies) is a Dutch BERT model based on BERTje (De Vries, 2019).
|
5 |
+
|
6 |
+
### Intended use
|
7 |
+
The model was designed to predict the sentiment in Dutch-language news article text about public agencies.
|
8 |
+
|
9 |
+
### How to use
|
10 |
+
The model can be loaded and used to make predictions as follows:
|
11 |
+
|
12 |
+
```
|
13 |
+
from transformers import pipeline
|
14 |
+
model_path = 'clips/republic'
|
15 |
+
pipe = pipeline(task="text-classification", model=model_path, tokenizer=model_path)
|
16 |
+
text = … # load your text here
|
17 |
+
output = pipe(text)
|
18 |
+
prediction = output[0]['label'] # 0=”neutral”; 1=”positive”; 2=”negative”
|
19 |
+
```
|
20 |
+
|
21 |
+
### Training and data procedure
|
22 |
+
RePublic was domain-adapted on 91,661 Flemish news articles from three popular Flemish news providers (“Het Laatste Nieuws”, “Het Nieuwsblad” and “De Morgen”) that mention public agencies. This was done by performing BERT’s language modeling tasks (masked language modeling & next sentence prediction).
|
23 |
+
|
24 |
+
The model was then fine-tuned on a sentiment classification task (“positive”, “negative”, “neutral”). The data consisted of 4,404 annotated sentences mentioning Flemish public agencies and fine-tuning was performed for 4 epochs using a batch size of 8 and a learning rate of 5e-5.
|
25 |
+
|
26 |
+
[TABLE 1: STATISTICS OF FINE-TUNING DATA]
|
27 |
+
|
28 |
+
### Evaluation
|
29 |
+
The model was evaluated by performing 10-fold cross validation on the annotated data described above. During cross validation, the optimal number of epochs (4), batch size (8), and learning rate (5e-5) were determined.
|
30 |
+
|
31 |
+
[TABLE 2: CROSS VALIDATION RESULTS]
|