---
license: cc-by-4.0
library_name: transformers
language:
- de
pipeline_tag: text-classification
---

# Model Card for Model ID

Fine-tuned [XLM-R Large](https://huggingface.co/FacebookAI/xlm-roberta-large) for task of classifying sentences as polarizing or not. The taxonomy for polarizing claims follows Ashraf et al. 2024. The model was first trained on a Telegram dataset that was annotated using GPT-4o with this [prompt](https://huggingface.co/Sami92/XLM-R-Large-Polarization-Classifier/blob/main/PolarizationPrompt_GPT.txt). In a second step it was trained on the data from Ashraf et al. 2024.


## Model Details


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]


## How to Get Started with the Model

```python
from transformers import pipeline

texts = [
       'Afghanistan - Warum die Taliban Frauenrechte immer mehr einschränken\nhttps://t.co/rhwOdNoJUx',
       '#Münster #G7 oder "Ab jetzt außen rumfahren". https://t.co/Goj5vtrnst',
       'Interessantes Trio.\nDie eine hat eine Wahl vergeigt, die andere kungelt mit Putin und die Dritte hat die Hilfe nach der Flutkatastrophe nicht auf die Reihe bekommen. \nMehr Frauen an die Macht!',
       'Wie kann man sich #AnneWill betrachten ohne das übertragende Gerät zu zerschmettern. Eben 20 sec. dem #FDP Watschengesicht beim Quaken zugehört. Du lieber Himmel, wie weltfremd geht´s denn noch.'
  ]
checkpoint = "Sami92/XLM-R-Large-Polarization-Classifier"
tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
polarization_classifier = pipeline("text-classification", model = checkpoint, tokenizer =checkpoint, **tokenizer_kwargs, device="cuda")
polarization_classifier(texts)
```

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

The trainingdata for the weakly supervised training was taken from Telegram. More specifically from a set of about 200 channels that have been subject to a fact-check from either Correctiv, dpa, Faktenfuchs or AFP. A sample of 5000 posts was chosen.

In a second step, the model was fine-tuned on the train split from Ashraf et al. 2024.


#### Training Hyperparameters

Weakly-supervised Training on Telegram Data

- Epochs: 10
- Batch size: 16
- learning_rate: 2e-5
- weight_decay: 0.01
- fp16: True

Supervised Training on Ashraf et al. 2024

- Epochs: 10
- Batch size: 16
- learning_rate: 2e-5
- weight_decay: 0.01
- fp16: True

## Evaluation


#### Testing Data

Evaluation was performed on the test split from Ashraf et al. 2024.


### Results
| Category            | Precision | Recall | F1-Score | Support |
|---------------------|:---------:|:------:|:--------:|:-------:|
| **non-polarization** |   0.89    |  0.89  |   0.89   |  1350   |
| **polarization**     |   0.67    |  0.67  |   0.67   |   463   |
|                     |           |        |          |         |
| **Accuracy**         |           |        |   0.83   |  1813   |
| **Macro avg**        |   0.78    |  0.78  |   0.78   |  1813   |
| **Weighted avg**     |   0.83    |  0.83  |   0.83   |  1813   |


**BibTeX:**

```bibtex

@inproceedings{ashraf_defakts_2024,
	address = {Torino, Italia},
	title = {{DeFaktS}: {A} {German} {Dataset} for {Fine}-{Grained} {Disinformation} {Detection} through {Social} {Media} {Framing}},
	shorttitle = {{DeFaktS}},
	url = {https://aclanthology.org/2024.lrec-main.409},
	booktitle = {Proceedings of the 2024 {Joint} {International} {Conference} on {Computational} {Linguistics}, {Language} {Resources} and {Evaluation} ({LREC}-{COLING} 2024)},
	publisher = {ELRA and ICCL},
	author = {Ashraf, Shaina and Bezzaoui, Isabel and Andone, Ionut and Markowetz, Alexander and Fegert, Jonas and Flek, Lucie},
	editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen},
	year = {2024},
}
```