partypress
/

partypress-multilingual

Text Classification

political science

Inference Endpoints

Model card Files Files and versions Community

partypress-multilingual / README.md

cornelius's picture

Update README.md

4e8c48d over 1 year ago

|

3.27 kB

	---
	license: cc-by-sa-4.0
	language:
	- de
	- en
	- es
	- da
	- pl
	- sv
	- nl
	metrics:
	- accuracy
	pipeline_tag: text-classification
	tags:
	- partypress
	- political science
	- parties
	- press releases
	---

	currently the model only works on German texts

	# PARTYPRESS multilingual

	Fine-tuned model in seven languages on texts from nine countries, based on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased). It used in Erfort et al. (2023).


	## Model description

	The PARTYPRESS multilingual model builds on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP).


	## Model variations

	We plan to release monolingual models for each of the languages covered by this multilingual model.

	## Intended uses & limitations

	The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.

	### How to use

	This model can be used directly with a pipeline for text classification:

	```python
	>>> from transformers import pipeline
	>>> partypress = pipeline("text-classification", model = "cornelius/partypress-multilingual", tokenizer = "cornelius/partypress-multilingual")
	>>> partypress("We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.")

	```

	### Limitations and bias

	The model was trained with data from parties in nine countries. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.

	The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database.

	## Training data

	The PARTYPRESS multilingual model was fine-tuned with 27,243 press releases in seven languages on texts from 68 European parties in nine countries. The press releases were labeled by two expert human coders per country.

	For the training data of the underlying model, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)

	## Training procedure

	### Preprocessing

	For the preprocessing, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)

	### Pretraining

	For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)

	### Fine-tuning


	## Evaluation results

	Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation:

	\| Accuracy \| Precision \| Recall \| F1 score \|
	\|:--------:\|:---------:\|:-------:\|:--------:\|
	\| 69.52 \| 67.99 \| 67.60 \| 66.77 \|

	### BibTeX entry and citation info

	```bibtex
	@article{erfort_partypress_2023,
	author = {Cornelius Erfort and
	Lukas F. Stoetzer and
	Heike Klüver},
	title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
	journal = {Research and Politics},
	volume = {forthcoming},
	year = {2023},
	}
	```