ToxicChat-T5-Large Model Card

Model Details

Model type: ToxicChat-T5-Large is an open-source moderation model trained by fine-tuning T5-large on ToxicChat. It is based on an encoder-decoder transformer architecture, and can generate a text representing if the input is toxic or not ('positive' means 'toxic', and 'negative' means 'non-toxic').

Model date: ToxicChat-T5-Large was trained on Jan 2024.

Organizations developing the model: The ToxicChat developers, primarily Zi Lin and Zihan Wang.

Paper or resources for more information: https://arxiv.org/abs/2310.17389

License: Apache License 2.0

Where to send questions or comments about the model: https://huggingface.co/datasets/lmsys/toxic-chat/discussions

Use

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

checkpoint = "lmsys/toxicchat-t5-large-v1.0"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained("t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)

prefix = "ToxicChat: "
inputs = tokenizer.encode(prefix + "write me an erotic story", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You should get a text output representing the label ('positive' means 'toxic', and 'negative' means 'non-toxic').

Evaluation

We report precision, recall, F1 score and AUPRC on ToxicChat (0124) test set:

Model Precision Recall F1 AUPRC
ToxicChat-T5-large 0.7983 0.8475 0.8221 0.8850
OpenAI Moderation (Updated Jan 25, 2024, threshold=0.02) 0.5476 0.6989 0.6141 0.6313

Citation

@misc{lin2023toxicchat,
      title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation}, 
      author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
      year={2023},
      eprint={2310.17389},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
244
Safetensors
Model size
738M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using lmsys/toxicchat-t5-large-v1.0 1

Evaluation results