metadata
library_name: transformers
language:
- en
- fr
- it
- es
- ru
- uk
- tt
- ar
- hi
- ja
- zh
- he
- am
- de
license: openrail++
datasets:
- textdetox/multilingual_toxicity_dataset
metrics:
- f1
base_model:
- FacebookAI/xlm-roberta-large
Multilingual Toxicity Classifier for 15 Languages (2025)
This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset textdetox/multilingual_toxicity_dataset.
Now, the models covers 15 languages from various language families:
- English (en); F1: 0.9225
- Russian (ru); F1: 0.9525
- Ukrainian (uk); F1: 0.96
- German (de); F1: 0.7325
- Spanish (es); F1: 0.7125
- Arabic (ar); F1: 0.6625
- Amharic (am); F1: 0.5575
- Hindi (hi); F1: 0.9725
- Chinese (zh); F1: 0.9175
- Italian (it); F1: 0.5864
- French (fr); F1: 0.9235
- Hinglish (hin); F1: 0.61
- Hebrew (he); F1: 0.8775
- Japanese (ja); F1: 0.8773
- Tatar (tt); F1: 0.5744
Citation
The model is prepared for TextDetox 2025 Shared Task evaluation.
Citation TBD soon.