Text Classification
Transformers
Safetensors
xlm-roberta
dardem's picture
Update README.md
dff4c4f verified
|
raw
history blame
1.24 kB
metadata
library_name: transformers
language:
  - en
  - fr
  - it
  - es
  - ru
  - uk
  - tt
  - ar
  - hi
  - ja
  - zh
  - he
  - am
  - de
license: openrail++
datasets:
  - textdetox/multilingual_toxicity_dataset
metrics:
  - f1
base_model:
  - FacebookAI/xlm-roberta-large

Multilingual Toxicity Classifier for 15 Languages (2025)

This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset textdetox/multilingual_toxicity_dataset.

Now, the models covers 15 languages from various language families:

  • English (en); F1: 0.9225
  • Russian (ru); F1: 0.9525
  • Ukrainian (uk); F1: 0.96
  • German (de); F1: 0.7325
  • Spanish (es); F1: 0.7125
  • Arabic (ar); F1: 0.6625
  • Amharic (am); F1: 0.5575
  • Hindi (hi); F1: 0.9725
  • Chinese (zh); F1: 0.9175
  • Italian (it); F1: 0.5864
  • French (fr); F1: 0.9235
  • Hinglish (hin); F1: 0.61
  • Hebrew (he); F1: 0.8775
  • Japanese (ja); F1: 0.8773
  • Tatar (tt); F1: 0.5744

Citation

The model is prepared for TextDetox 2025 Shared Task evaluation.

Citation TBD soon.