Text Classification
Transformers
Safetensors
xlm-roberta
Inference Endpoints
dardem's picture
Update README.md
926ad04 verified
metadata
library_name: transformers
language:
  - en
  - fr
  - it
  - es
  - ru
  - uk
  - tt
  - ar
  - hi
  - ja
  - zh
  - he
  - am
  - de
license: openrail++
datasets:
  - textdetox/multilingual_toxicity_dataset
metrics:
  - f1
base_model:
  - FacebookAI/xlm-roberta-large
pipeline_tag: text-classification

Multilingual Toxicity Classifier for 15 Languages (2025)

This is an instance of xlm-roberta-large that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset textdetox/multilingual_toxicity_dataset.

Now, the models covers 15 languages from various language families:

Language Code F1 Score
English en 0.9225
Russian ru 0.9525
Ukrainian uk 0.96
German de 0.7325
Spanish es 0.7125
Arabic ar 0.6625
Amharic am 0.5575
Hindi hi 0.9725
Chinese zh 0.9175
Italian it 0.5864
French fr 0.9235
Hinglish hin 0.61
Hebrew he 0.8775
Japanese ja 0.8773
Tatar tt 0.5744

How to use

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('textdetox/xlmr-large-toxicity-classifier-v2')
model = AutoModelForSequenceClassification.from_pretrained('textdetox/xlmr-large-toxicity-classifier-v2')

batch = tokenizer.encode("You are amazing!", return_tensors="pt")

output = model(batch)
# idx 0 for neutral, idx 1 for toxic

Citation

The model is prepared for TextDetox 2025 Shared Task evaluation.

Citation TBD soon.