cornelius commited on
Commit
d9c510c
1 Parent(s): 5094afd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -24
README.md CHANGED
@@ -1,48 +1,105 @@
1
  ---
2
  license: cc-by-sa-4.0
 
 
 
 
 
 
 
 
 
 
 
3
  tags:
4
- - generated_from_keras_callback
5
- model-index:
6
- - name: partypress-multilingual
7
- results: []
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
11
- probably proofread and complete it, then remove this comment. -->
12
 
13
- # partypress-multilingual
14
-
15
- This model is a fine-tuned version of [cornelius/partypress-multilingual](https://huggingface.co/cornelius/partypress-multilingual) on an unknown dataset.
16
- It achieves the following results on the evaluation set:
17
 
18
 
19
  ## Model description
20
 
21
- More information needed
 
 
 
 
 
22
 
23
  ## Intended uses & limitations
24
 
25
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- ## Training and evaluation data
28
 
29
- More information needed
 
 
30
 
31
  ## Training procedure
32
 
33
- ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
- The following hyperparameters were used during training:
36
- - optimizer: None
37
- - training_precision: float32
38
 
39
- ### Training results
40
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
 
43
- ### Framework versions
44
 
45
- - Transformers 4.28.0
46
- - TensorFlow 2.12.0
47
- - Datasets 2.12.0
48
- - Tokenizers 0.13.3
 
1
  ---
2
  license: cc-by-sa-4.0
3
+ language:
4
+ - de
5
+ - en
6
+ - es
7
+ - da
8
+ - pl
9
+ - sv
10
+ - nl
11
+ metrics:
12
+ - accuracy
13
+ pipeline_tag: text-classification
14
  tags:
15
+ - partypress
16
+ - political science
17
+ - parties
18
+ - press releases
19
  ---
20
 
21
+ # PARTYPRESS multilingual
 
22
 
23
+ Fine-tuned model in seven languages on texts from nine countries, based on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased). Used in Erfort et al. (2023), building on the PARTYPRESS database.
 
 
 
24
 
25
 
26
  ## Model description
27
 
28
+ The PARTYPRESS multilingual model builds on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP).
29
+
30
+
31
+ ## Model variations
32
+
33
+ We plan to release monolingual models for each of the languages covered by this multilingual model.
34
 
35
  ## Intended uses & limitations
36
 
37
+ The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
38
+
39
+ The classification can then be used to measure which issues parties are discussing in their communication.
40
+
41
+ ### How to use
42
+
43
+ This model can be used directly with a pipeline for text classification:
44
+
45
+ ```python
46
+ >>> from transformers import pipeline
47
+ >>> partypress = pipeline("text-classification", model = "cornelius/partypress-multilingual", tokenizer = "cornelius/partypress-multilingual")
48
+ >>> partypress("We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.")
49
+
50
+ ```
51
+
52
+ ### Limitations and bias
53
+
54
+ The model was trained with data from parties in nine countries. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
55
+
56
+ The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database.
57
 
58
+ ## Training data
59
 
60
+ The PARTYPRESS multilingual model was fine-tuned with 27,243 press releases in seven languages on texts from 68 European parties in nine countries. The press releases were labeled by two expert human coders per country.
61
+
62
+ For the training data of the underlying model, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
63
 
64
  ## Training procedure
65
 
66
+ ### Preprocessing
67
+
68
+ For the preprocessing, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
69
+
70
+ ### Pretraining
71
+
72
+ For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
73
+
74
+ ### Fine-tuning
75
+
76
+
77
+
78
+ ## Evaluation results
79
+
80
+ Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation that are comparable to the performance of our expert human coders:
81
+
82
+ | Accuracy | Precision | Recall | F1 score |
83
+ |:--------:|:---------:|:-------:|:--------:|
84
+ | 69.52 | 67.99 | 67.60 | 66.77 |
85
+
86
+ Note that the classification task is difficult because topics such as environment and energy are often difficult to keep apart.
87
 
88
+ When we aggregate the shares of text for each issue, we find that the root-mean-square error is very low (0.29).
 
 
89
 
90
+ ### BibTeX entry and citation info
91
 
92
+ ```bibtex
93
+ @article{erfort_partypress_2023,
94
+ author = {Cornelius Erfort and
95
+ Lukas F. Stoetzer and
96
+ Heike Klüver},
97
+ title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
98
+ journal = {Research and Politics},
99
+ volume = {forthcoming},
100
+ year = {2023},
101
+ }
102
+ ```
103
 
104
 
 
105