README.md · textdetox/xlmr-large-toxicity-classifier-v2 at dff4c4f9fdb21f1d7e710f9cb1413375df467e37

metadata

library_name: transformers
language:
  - en
  - fr
  - it
  - es
  - ru
  - uk
  - tt
  - ar
  - hi
  - ja
  - zh
  - he
  - am
  - de
license: openrail++
datasets:
  - textdetox/multilingual_toxicity_dataset
metrics:
  - f1
base_model:
  - FacebookAI/xlm-roberta-large

Multilingual Toxicity Classifier for 15 Languages (2025)

This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset textdetox/multilingual_toxicity_dataset.

Now, the models covers 15 languages from various language families:

English (en); F1: 0.9225
Russian (ru); F1: 0.9525
Ukrainian (uk); F1: 0.96
German (de); F1: 0.7325
Spanish (es); F1: 0.7125
Arabic (ar); F1: 0.6625
Amharic (am); F1: 0.5575
Hindi (hi); F1: 0.9725
Chinese (zh); F1: 0.9175
Italian (it); F1: 0.5864
French (fr); F1: 0.9235
Hinglish (hin); F1: 0.61
Hebrew (he); F1: 0.8775
Japanese (ja); F1: 0.8773
Tatar (tt); F1: 0.5744

Citation

The model is prepared for TextDetox 2025 Shared Task evaluation.

Citation TBD soon.