metadata
library_name: transformers
language:
- en
- fr
- it
- es
- ru
- uk
- tt
- ar
- hi
- ja
- zh
- he
- am
- de
license: openrail++
datasets:
- textdetox/multilingual_toxicity_dataset
metrics:
- f1
base_model:
- FacebookAI/xlm-roberta-large
Multilingual Toxicity Classifier for 15 Languages (2025)
This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset textdetox/multilingual_toxicity_dataset.
Now, the models covers 15 languages from various language families:
- English (en)
- Russian (ru)
- Ukrainian (uk)
- German (de)
- Spanish (es)
- Arabic (ar)
- Amharic (am)
- Hindi (hi)
- Chinese (zh)
- Italian (it)
- French (fr)
- Hinglish (hin)
- Hebrew (he)
- Japanese (ja)
- Tatar (tt)
The evaluation results on the test set are the following:
F1 | |
---|---|
en | 0.9650 |
ru | 0.9790 |
uk | 0.9251 |
de | 0.8758 |
es | 0.8700 |
ar | 0.7780 |
am | 0.7780 |
hi | 0.9360 |
zh | 0.7315 |
it | |
fr | |
hin | |
he | |
ja | |
tt |
Citation
The model is prepared for TextDetox 2025 Shared Task evaluation.
Citation TBD soon.