Danish Offensive Text Detection based on XLM-Roberta-Base

This model is a fine-tuned version of xlm-roberta-base on a dataset consisting of approximately 5 million Facebook comments on DR's public Facebook pages. The labels have been automatically generated using weak supervision, based on the Snorkel framework.

The model achieves SOTA on a test set consisting of 600 Facebook comments annotated using majority vote by three annotators, of which 35.8% were labelled as offensive:

Model Precision Recall F1-score F2-score
alexandrainst/da-offensive-detection-base (this) 74.81% 89.77% 81.61% 86.32%
alexandrainst/da-offensive-detection-small 74.13% 89.30% 81.01% 85.79%
A&ttack 97.32% 50.70% 66.67% 56.07%
alexandrainst/da-hatespeech-detection-small 86.43% 56.28% 68.17% 60.50%
Guscode/DKbert-hatespeech-detection 75.41% 42.79% 54.60% 46.84%

Using the model

You can use the model simply by running the following:

>>> from transformers import pipeline
>>> offensive_text_pipeline = pipeline(model="alexandrainst/da-offensive-detection-base")
>>> offensive_text_pipeline("Din store idiot")
[{'label': 'Offensive', 'score': 0.9997463822364807}]

Processing multiple documents at the same time can be done as follows:

>>> offensive_text_pipeline(["Din store idiot", "ej hvor godt :)"])
[{'label': 'Offensive', 'score': 0.9997463822364807}, {'label': 'Not offensive', 'score': 0.9996451139450073}]

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • gradient_accumulation_steps: 1
  • total_train_batch_size: 32
  • seed: 4242
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • max_steps: 500000
  • fp16: True
  • eval_steps: 1000
  • early_stopping_patience: 100

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.11.0+cu113
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month
29
Safetensors
Model size
278M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for alexandrainst/da-offensive-detection-base

Finetuned
(2698)
this model