textdetox
/

xlmr-large-toxicity-classifier-v2

Text Classification

Inference Endpoints

Model card Files Files and versions Community

xlmr-large-toxicity-classifier-v2 / README.md

dardem's picture

Update README.md

926ad04 verified about 15 hours ago

|

history blame contribute delete

1.85 kB

	---
	library_name: transformers
	language:
	- en
	- fr
	- it
	- es
	- ru
	- uk
	- tt
	- ar
	- hi
	- ja
	- zh
	- he
	- am
	- de
	license: openrail++
	datasets:
	- textdetox/multilingual_toxicity_dataset
	metrics:
	- f1
	base_model:
	- FacebookAI/xlm-roberta-large
	pipeline_tag: text-classification
	---

	## Multilingual Toxicity Classifier for 15 Languages (2025)

	This is an instance of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset).

	Now, the models covers 15 languages from various language families:

	\| Language \| Code \| F1 Score \|
	\|-----------\|------\|---------\|
	\| English \| en \| 0.9225 \|
	\| Russian \| ru \| 0.9525 \|
	\| Ukrainian \| uk \| 0.96 \|
	\| German \| de \| 0.7325 \|
	\| Spanish \| es \| 0.7125 \|
	\| Arabic \| ar \| 0.6625 \|
	\| Amharic \| am \| 0.5575 \|
	\| Hindi \| hi \| 0.9725 \|
	\| Chinese \| zh \| 0.9175 \|
	\| Italian \| it \| 0.5864 \|
	\| French \| fr \| 0.9235 \|
	\| Hinglish \| hin \| 0.61 \|
	\| Hebrew \| he \| 0.8775 \|
	\| Japanese \| ja \| 0.8773 \|
	\| Tatar \| tt \| 0.5744 \|

	## How to use

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained('textdetox/xlmr-large-toxicity-classifier-v2')
	model = AutoModelForSequenceClassification.from_pretrained('textdetox/xlmr-large-toxicity-classifier-v2')

	batch = tokenizer.encode("You are amazing!", return_tensors="pt")

	output = model(batch)
	# idx 0 for neutral, idx 1 for toxic
	```

	## Citation
	The model is prepared for [TextDetox 2025 Shared Task](https://pan.webis.de/clef25/pan25-web/text-detoxification.html) evaluation.

	Citation TBD soon.