Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,58 @@ language:
|
|
15 |
- he
|
16 |
- am
|
17 |
- de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
---
|
19 |
|
20 |
-
## Multilingual Toxicity Classifier for 15 Languages
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
- he
|
16 |
- am
|
17 |
- de
|
18 |
+
license: openrail++
|
19 |
+
datasets:
|
20 |
+
- textdetox/multilingual_toxicity_dataset
|
21 |
+
metrics:
|
22 |
+
- f1
|
23 |
+
base_model:
|
24 |
+
- FacebookAI/xlm-roberta-large
|
25 |
---
|
26 |
|
27 |
+
## Multilingual Toxicity Classifier for 15 Languages (2025)
|
28 |
+
|
29 |
+
This is an instance of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset).
|
30 |
+
|
31 |
+
Now, the models covers 15 languages from various language families:
|
32 |
+
* English (en)
|
33 |
+
* Russian (ru)
|
34 |
+
* Ukrainian (uk)
|
35 |
+
* German (de)
|
36 |
+
* Spanish (es)
|
37 |
+
* Arabic (ar)
|
38 |
+
* Amharic (am)
|
39 |
+
* Hindi (hi)
|
40 |
+
* Chinese (zh)
|
41 |
+
* Italian (it)
|
42 |
+
* French (fr)
|
43 |
+
* Hinglish (hin)
|
44 |
+
* Hebrew (he)
|
45 |
+
* Japanese (ja)
|
46 |
+
* Tatar (tt)
|
47 |
+
|
48 |
+
The evaluation results on the test set are the following:
|
49 |
+
|
50 |
+
| | F1 |
|
51 |
+
|----------|-------|
|
52 |
+
| en | 0.9650|
|
53 |
+
| ru | 0.9790|
|
54 |
+
| uk | 0.9251|
|
55 |
+
| de | 0.8758|
|
56 |
+
| es | 0.8700|
|
57 |
+
| ar | 0.7780|
|
58 |
+
| am | 0.7780|
|
59 |
+
| hi | 0.9360|
|
60 |
+
| zh | 0.7315|
|
61 |
+
| it | |
|
62 |
+
| fr | |
|
63 |
+
| hin | |
|
64 |
+
| he | |
|
65 |
+
| ja | |
|
66 |
+
| tt | |
|
67 |
+
|
68 |
+
|
69 |
+
## Citation
|
70 |
+
The model is prepared for TextDetox 2025 Shared Task evaluation.
|
71 |
+
|
72 |
+
Citation TBD soon.
|