Text Classification
Transformers
Safetensors
xlm-roberta
dardem's picture
Update README.md
94331a8 verified
|
raw
history blame
1.42 kB
metadata
library_name: transformers
language:
  - en
  - fr
  - it
  - es
  - ru
  - uk
  - tt
  - ar
  - hi
  - ja
  - zh
  - he
  - am
  - de
license: openrail++
datasets:
  - textdetox/multilingual_toxicity_dataset
metrics:
  - f1
base_model:
  - FacebookAI/xlm-roberta-large

Multilingual Toxicity Classifier for 15 Languages (2025)

This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset textdetox/multilingual_toxicity_dataset.

Now, the models covers 15 languages from various language families:

  • English (en)
  • Russian (ru)
  • Ukrainian (uk)
  • German (de)
  • Spanish (es)
  • Arabic (ar)
  • Amharic (am)
  • Hindi (hi)
  • Chinese (zh)
  • Italian (it)
  • French (fr)
  • Hinglish (hin)
  • Hebrew (he)
  • Japanese (ja)
  • Tatar (tt)

The evaluation results on the test set are the following:

F1
en 0.9650
ru 0.9790
uk 0.9251
de 0.8758
es 0.8700
ar 0.7780
am 0.7780
hi 0.9360
zh 0.7315
it
fr
hin
he
ja
tt

Citation

The model is prepared for TextDetox 2025 Shared Task evaluation.

Citation TBD soon.