Celadon Toxicity Classifier

Celadon is a DeBERTa-v3-small finetune with five classification heads, trained on 600k samples from Toxic Commons.

It classfies toxicity along five dimension:

Race and origin-based bias: includes racism as well as bias against someone’s country or region of origin or immigration status, especially immigrant or refugee status.
Gender and sexuality-based bias: includes sexism and misogyny, homophobia, transphobia, and sexual harassment.
Religious bias: any bias or stereotype based on someone’s religion.
Ability bias: bias according to someone’s physical, mental, or intellectual ability or disability.
Violence and abuse: overly graphic descriptions of violence, threats of violence, or calls or incitement of violence.

Read more about the training details in the paper, Toxicity of the Commons: Curating Open-Source Pre-Training Data by Catherine Arnett, Eliot Jones, Ivan P. Yamshchikov, Pierre-Carl Langlais. For more detailed code regarding generating the annotations in Toxic Commons, training the model, and using the model, please refer to the official GitHub repository.

How to Use

from transformers import AutoTokenizer
from celadon.model import MultiHeadDebertaForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("celadon")
model = MultiHeadDebertaForSequenceClassification.from_pretrained("celadon")
model.eval()

sample_text = "This is an example of a normal sentence"

inputs = tokenizer(sample_text, return_tensors="pt", padding=True, truncation=True)
outputs = model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])

categories = ['Race/Origin', 'Gender/Sex', 'Religion', 'Ability', 'Violence']
predictions = outputs.argmax(dim=-1).squeeze().tolist()

# Print the classification results for each category
print(f"Text: {sample_text}")
for i, category in enumerate(categories):
    print(f"Prediction for Category {category}: {predictions[i]}")

How to Cite

@article{arnett2024toxicity,
  title={{Toxicity of the Commons: Curating Open-Source Pre-Training Data}},
  author={Arnett, Catherine and Jones, Eliot and Yamshchikov, Ivan P. and Langlais, Pierre-Carl},
  journal={arXiv preprint arXiv:2410.22587},
  url={https://arxiv.org/pdf/2410.22587},
  year={2024}
}

About

Trained by Eliot Jones while working at Pleias. This project was made possible by Jean Zay compute grant #GC011015451.

About the Name

Celadon is a type of porcelain, whose European name refers to its jade-like color. The Chinese name for this type of pottery is 青瓷, which means blue-green ceramic. The earliest examples of celadon pottery date from the first century AD. Celadon was first brought to Europe by the Dutch East India Company in the 16th and 17th centuries. In order to increase sales, as the ceramics were very expensive to bring to Europe from China, the Dutch made up fantastical properties of the ceramics, for example that celadon would change color or break in the presence of poison.

PleIAs
/

celadon

Celadon Toxicity Classifier

How to Use

How to Cite

About

About the Name

Model tree for PleIAs/celadon

Dataset used to train PleIAs/celadon

Collections including PleIAs/celadon

Toxic Commons

Common Artifacts