|
--- |
|
license: cc-by-sa-4.0 |
|
language: |
|
- de |
|
- en |
|
- es |
|
- da |
|
- pl |
|
- sv |
|
- nl |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
tags: |
|
- partypress |
|
- political science |
|
- parties |
|
- press releases |
|
--- |
|
|
|
*currently the model only works on German texts* |
|
|
|
# PARTYPRESS multilingual |
|
|
|
Fine-tuned model in seven languages on texts from nine countries, based on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased). It used in Erfort et al. (2023). |
|
|
|
|
|
## Model description |
|
|
|
The PARTYPRESS multilingual model builds on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP). |
|
|
|
|
|
## Model variations |
|
|
|
We plan to release monolingual models for each of the languages covered by this multilingual model. |
|
|
|
## Intended uses & limitations |
|
|
|
The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts. |
|
|
|
### How to use |
|
|
|
This model can be used directly with a pipeline for text classification: |
|
|
|
```python |
|
>>> from transformers import pipeline |
|
>>> partypress = pipeline("text-classification", model = "cornelius/partypress-multilingual", tokenizer = "cornelius/partypress-multilingual") |
|
>>> partypress("We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.") |
|
|
|
``` |
|
|
|
### Limitations and bias |
|
|
|
The model was trained with data from parties in nine countries. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower. |
|
|
|
The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database. |
|
|
|
## Training data |
|
|
|
The PARTYPRESS multilingual model was fine-tuned with 27,243 press releases in seven languages on texts from 68 European parties in nine countries. The press releases were labeled by two expert human coders per country. |
|
|
|
For the training data of the underlying model, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) |
|
|
|
## Training procedure |
|
|
|
### Preprocessing |
|
|
|
For the preprocessing, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) |
|
|
|
### Pretraining |
|
|
|
For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) |
|
|
|
### Fine-tuning |
|
|
|
|
|
## Evaluation results |
|
|
|
Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation: |
|
|
|
| Accuracy | Precision | Recall | F1 score | |
|
|:--------:|:---------:|:-------:|:--------:| |
|
| 69.52 | 67.99 | 67.60 | 66.77 | |
|
|
|
### BibTeX entry and citation info |
|
|
|
```bibtex |
|
@article{erfort_partypress_2023, |
|
author = {Cornelius Erfort and |
|
Lukas F. Stoetzer and |
|
Heike Klüver}, |
|
title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases}, |
|
journal = {Research and Politics}, |
|
volume = {forthcoming}, |
|
year = {2023}, |
|
} |
|
``` |
|
|
|
|
|
|
|
|