--- license: cc-by-sa-4.0 language: - de - en - es - da - pl - sv - nl metrics: - accuracy pipeline_tag: text-classification tags: - partypress - political science - parties - press releases widget: - text: "Labour launches air pollution campaign Labour’s Shadow Minister for the Natural Environment, will today (Wednesday) launch Labour’s campaign against air pollution. Maria Eagle will say that 29,000 people die prematurely in the UK each year because of poor air pollution in our towns and cities - including more than 3,000 in London. Scientists have warned that air pollution in Britain’s most polluted cities is stunting the development of children’s lungs. Maria Eagle MP and Sadiq Khan MP will announce that the next Labour Government will deliver a national framework for Low Emission Zones to enable local authorities to encourage cleaner, greener, less-polluting vehicles to begin to tackle this problem. Unlike this Tory-led Government the Labour Party will devolve the power, not just the responsibility, to Local Authorities willing take action against air pollution." - text: "Dawn Butler MP, Labour’s Shadow\nWomen and Equalities Secretary, commenting on Equal Pay Day today, said: “From today onwards women effectively work the rest of the year for free, which means fifty days of unpaid labour until we hit 2018. “The fact that the gender pay gap has remained the same for the past three years is a shocking indictment of the Government’s failure to tackle unequal pay and the underlying structural issues that allow these disparities to exist. “Labour is the party of equality. It was Labour that introduced the Equal Pay Act in 1970 and the next Labour Government will take the necessary action to end the scourge of unequal pay once and for all." --- # PARTYPRESS multilingual Fine-tuned model in seven languages on texts from nine countries (Austria, Denmark, Germany, Ireland, Netherlands, Poland, Spain, Sweden, UK), based on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased). Used in Erfort et al. (2023), building on the PARTYPRESS database. For the downstream task of classyfing press releases into 23 unique issue areas we achieve a performance comparable to expert human coders. ## Model description The PARTYPRESS multilingual model builds on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP), e.g. Environment, Immigration, or Labor. ## Model variations We plan to release monolingual models for each of the languages covered by this multilingual model. ## Intended uses & limitations The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts. The classification can then be used to measure which issues parties are discussing in their communication. ### How to use This model can be used directly with a pipeline for text classification: ```python >>> from transformers import pipeline >>> partypress = pipeline("text-classification", model = "cornelius/partypress-multilingual", tokenizer = "cornelius/partypress-multilingual") >>> partypress("We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.") ``` ### Limitations and bias The model was trained with data from parties in nine countries. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower. The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database. For example, the performance is highest for press releases from Ireland (75%) and lowest for Poland (55%). ## Training data The PARTYPRESS multilingual model was fine-tuned with 27,243 press releases in seven languages on texts from 68 European parties in nine countries. The press releases were labeled by two expert human coders per country. For the training data of the underlying model, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) ## Training procedure ### Preprocessing For the preprocessing, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) ### Pretraining For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) ### Fine-tuning We fine-tuned the model using 27,243 labeled press releases from political parties in seven languages. #### Training Hyperparameters The batch size for training was 12, for testing 2, with four epochs. All other hyperparameters were the standard from the transformers library. ## Evaluation results Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation that are comparable to the performance of our expert human coders: | Accuracy | Precision | Recall | F1 score | |:--------:|:---------:|:-------:|:--------:| | 69.52 | 67.99 | 67.60 | 66.77 | Note that the classification task is difficult because topics such as environment and energy are often difficult to keep apart. When we aggregate the shares of text for each issue, we find that the root-mean-square error is very low (0.29). ### BibTeX entry and citation info ```bibtex @article{erfort_partypress_2023, author = {Cornelius Erfort and Lukas F. Stoetzer and Heike Klüver}, title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases}, journal = {Research and Politics}, volume = {forthcoming}, year = {2023}, } ``` ### Further resources Github: [cornelius-erfort/partypress](https://github.com/cornelius-erfort/partypress) Research and Politics Dataverse: [Replication Data for: The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FOINX7Q) ## Contact Cornelius Erfort Humboldt-Universität zu Berlin [corneliuserfort.de](corneliuserfort.de)