partypress
/

partypress-multilingual

@@ -1,48 +1,105 @@
 ---
 license: cc-by-sa-4.0
 tags:
-- generated_from_keras_callback
-model-index:
-- name: partypress-multilingual
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information Keras had access to. You should
-probably proofread and complete it, then remove this comment. -->
-# partypress-multilingual
-This model is a fine-tuned version of [cornelius/partypress-multilingual](https://huggingface.co/cornelius/partypress-multilingual) on an unknown dataset.
-It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- optimizer: None
-- training_precision: float32
-### Training results
-### Framework versions
-- Transformers 4.28.0
-- TensorFlow 2.12.0
-- Datasets 2.12.0
-- Tokenizers 0.13.3

 ---
 license: cc-by-sa-4.0
+language:
+- de
+- en
+- es
+- da
+- pl
+- sv
+- nl
+metrics:
+- accuracy
+pipeline_tag: text-classification
 tags:
+- partypress
+- political science
+- parties
+- press releases
 ---
+# PARTYPRESS multilingual
+Fine-tuned model in seven languages on texts from nine countries, based on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased). Used in Erfort et al. (2023), building on the PARTYPRESS database.
 ## Model description
+The PARTYPRESS multilingual model builds on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP).
+## Model variations
+We plan to release monolingual models for each of the languages covered by this multilingual model.
 ## Intended uses & limitations
+The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
+The classification can then be used to measure which issues parties are discussing in their communication.
+### How to use
+This model can be used directly with a pipeline for text classification:
+```python
+>>> from transformers import pipeline
+>>> partypress = pipeline("text-classification", model = "cornelius/partypress-multilingual", tokenizer = "cornelius/partypress-multilingual")
+>>> partypress("We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.")
+```
+### Limitations and bias
+The model was trained with data from parties in nine countries. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
+The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database.
+## Training data
+The PARTYPRESS multilingual model was fine-tuned with 27,243 press releases in seven languages on texts from 68 European parties in nine countries. The press releases were labeled by two expert human coders per country.
+For the training data of the underlying model, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
 ## Training procedure
+### Preprocessing
+For the preprocessing, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
+### Pretraining
+For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
+### Fine-tuning
+## Evaluation results
+Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation that are comparable to the performance of our expert human coders:
+| Accuracy | Precision | Recall  | F1 score |
+|:--------:|:---------:|:-------:|:--------:|
+|    69.52 |   67.99   | 67.60   |   66.77  |
+Note that the classification task is difficult because topics such as environment and energy are often difficult to keep apart.
+When we aggregate the shares of text for each issue, we find that the root-mean-square error is very low (0.29).
+### BibTeX entry and citation info
+```bibtex
+@article{erfort_partypress_2023,
+  author    = {Cornelius Erfort and
+               Lukas F. Stoetzer and
+               Heike Klüver},
+  title     = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
+  journal   = {Research and Politics},
+  volume    = {forthcoming},
+  year      = {2023},
+}
+```