Text Classification
Transformers
Safetensors
xlm-roberta
Inference Endpoints
dardem commited on
Commit
94331a8
·
verified ·
1 Parent(s): b94d83f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -15,6 +15,58 @@ language:
15
  - he
16
  - am
17
  - de
 
 
 
 
 
 
 
18
  ---
19
 
20
- ## Multilingual Toxicity Classifier for 15 Languages
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  - he
16
  - am
17
  - de
18
+ license: openrail++
19
+ datasets:
20
+ - textdetox/multilingual_toxicity_dataset
21
+ metrics:
22
+ - f1
23
+ base_model:
24
+ - FacebookAI/xlm-roberta-large
25
  ---
26
 
27
+ ## Multilingual Toxicity Classifier for 15 Languages (2025)
28
+
29
+ This is an instance of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset).
30
+
31
+ Now, the models covers 15 languages from various language families:
32
+ * English (en)
33
+ * Russian (ru)
34
+ * Ukrainian (uk)
35
+ * German (de)
36
+ * Spanish (es)
37
+ * Arabic (ar)
38
+ * Amharic (am)
39
+ * Hindi (hi)
40
+ * Chinese (zh)
41
+ * Italian (it)
42
+ * French (fr)
43
+ * Hinglish (hin)
44
+ * Hebrew (he)
45
+ * Japanese (ja)
46
+ * Tatar (tt)
47
+
48
+ The evaluation results on the test set are the following:
49
+
50
+ | | F1 |
51
+ |----------|-------|
52
+ | en | 0.9650|
53
+ | ru | 0.9790|
54
+ | uk | 0.9251|
55
+ | de | 0.8758|
56
+ | es | 0.8700|
57
+ | ar | 0.7780|
58
+ | am | 0.7780|
59
+ | hi | 0.9360|
60
+ | zh | 0.7315|
61
+ | it | |
62
+ | fr | |
63
+ | hin | |
64
+ | he | |
65
+ | ja | |
66
+ | tt | |
67
+
68
+
69
+ ## Citation
70
+ The model is prepared for TextDetox 2025 Shared Task evaluation.
71
+
72
+ Citation TBD soon.