Text Classification
Safetensors
deberta-v2
catherinearnett commited on
Commit
b8a417e
1 Parent(s): f9e8db6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -8
README.md CHANGED
@@ -16,14 +16,10 @@ pipeline_tag: text-classification
16
 
17
  # Celadon Toxicity Classifier
18
 
19
- [Pleias](https://huggingface.co/PleIAs)
20
-
21
 
22
  Celadon is a DeBERTa-v3-small finetune with five classification heads, trained on 600k samples from [Toxic Commons](https://huggingface.co/datasets/PleIAs/ToxicCommons).
23
 
24
- This classifier is primarily aimed at historical cultural heritage data, like that in [Common Corpus](https://huggingface.co/collections/PleIAs/common-corpus-65d46e3ea3980fdcd66a5613)
25
-
26
- Five types of toxicity classification:
27
  * **Race and origin-based bias**: includes racism as well as bias against someone’s country or region of origin or immigration status, especially immigrant or refugee status.
28
  * **Gender and sexuality-based bias**: includes sexism and misogyny, homophobia, transphobia, and sexual harassment.
29
  * **Religious bias**: any bias or stereotype based on someone’s religion.
@@ -31,7 +27,7 @@ Five types of toxicity classification:
31
  * **Violence and abuse**: overly graphic descriptions of violence, threats of violence, or calls or incitement of violence.
32
 
33
 
34
- Read more about the training details in the paper, [Toxicity of the Commons: Curating Open-Source Pre-Training Data] by [Catherine Arnett](https://huggingface.co/catherinearnett), [Eliot Jones](https://huggingface.co/eliotj), Ivan P. Yamshchikov, [Pierre-Carl Langlais](https://huggingface.co/Pclanglais).
35
  For more detailed code regarding generating the annotations in Toxic Commons, training the model, and using the model, please refer to the official [GitHub](https://github.com/eliotjones1/celadon) repository.
36
 
37
 
@@ -62,12 +58,18 @@ for i, category in enumerate(categories):
62
  # How to Cite
63
 
64
  ```
65
-
 
 
 
 
 
 
66
  ```
67
 
68
  # About
69
 
70
- Trained by [Eliot Jones](https://huggingface.co/eliotj). This project was made possible by Jean Zay compute grant #GC011015451.
71
 
72
  ## About the Name
73
  Celadon is a type of porcelain, whose European name refers to its jade-like color. The Chinese name for this type of pottery is 青瓷, which means blue-green ceramic. The earliest examples of celadon pottery date from the first century AD. Celadon was first brought to Europe by the Dutch East India Company in the 16th and 17th centuries. In order to increase sales, as the ceramics were very expensive to bring to Europe from China, the Dutch made up fantastical properties of the ceramics, for example that celadon would change color or break in the presence of poison.
 
16
 
17
  # Celadon Toxicity Classifier
18
 
 
 
19
 
20
  Celadon is a DeBERTa-v3-small finetune with five classification heads, trained on 600k samples from [Toxic Commons](https://huggingface.co/datasets/PleIAs/ToxicCommons).
21
 
22
+ It classfies toxicity along five dimension:
 
 
23
  * **Race and origin-based bias**: includes racism as well as bias against someone’s country or region of origin or immigration status, especially immigrant or refugee status.
24
  * **Gender and sexuality-based bias**: includes sexism and misogyny, homophobia, transphobia, and sexual harassment.
25
  * **Religious bias**: any bias or stereotype based on someone’s religion.
 
27
  * **Violence and abuse**: overly graphic descriptions of violence, threats of violence, or calls or incitement of violence.
28
 
29
 
30
+ Read more about the training details in the paper, [Toxicity of the Commons: Curating Open-Source Pre-Training Data](https://arxiv.org/pdf/2410.22587) by [Catherine Arnett](https://huggingface.co/catherinearnett), [Eliot Jones](https://huggingface.co/eliotj), Ivan P. Yamshchikov, [Pierre-Carl Langlais](https://huggingface.co/Pclanglais).
31
  For more detailed code regarding generating the annotations in Toxic Commons, training the model, and using the model, please refer to the official [GitHub](https://github.com/eliotjones1/celadon) repository.
32
 
33
 
 
58
  # How to Cite
59
 
60
  ```
61
+ @article{arnett2024toxicity,
62
+ title={{Toxicity of the Commons: Curating Open-Source Pre-Training Data}},
63
+ author={Arnett, Catherine and Jones, Eliot and Yamshchikov, Ivan P. and Langlais, Pierre-Carl},
64
+ journal={arXiv preprint arXiv:2410.22587},
65
+ url={https://arxiv.org/pdf/2410.22587},
66
+ year={2024}
67
+ }
68
  ```
69
 
70
  # About
71
 
72
+ Trained by [Eliot Jones](https://huggingface.co/eliotj) while working at [Pleias](https://huggingface.co/PleIAs). This project was made possible by Jean Zay compute grant #GC011015451.
73
 
74
  ## About the Name
75
  Celadon is a type of porcelain, whose European name refers to its jade-like color. The Chinese name for this type of pottery is 青瓷, which means blue-green ceramic. The earliest examples of celadon pottery date from the first century AD. Celadon was first brought to Europe by the Dutch East India Company in the 16th and 17th centuries. In order to increase sales, as the ceramics were very expensive to bring to Europe from China, the Dutch made up fantastical properties of the ceramics, for example that celadon would change color or break in the presence of poison.