ONNX version of unitary/unbiased-toxic-roberta
This model is a conversion of unitary/unbiased-toxic-roberta to ONNX format using the 🤗 Optimum library.
Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.
Built by Laura Hanu at Unitary.
⚠️ Disclaimer: The huggingface models currently give different results to the detoxify library (see issue here).
Labels
All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema:
- Very Toxic (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective)
- Toxic (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective)
- Hard to Say
- Not Toxic
More information about the labelling schema can be found here.
Toxic Comment Classification Challenge
This challenge includes the following labels:
toxic
severe_toxic
obscene
threat
insult
identity_hate
Jigsaw Unintended Bias in Toxicity Classification
This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments.
Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation.
toxicity
severe_toxicity
obscene
threat
insult
identity_attack
sexual_explicit
Identity labels used:
male
female
homosexual_gay_or_lesbian
christian
jewish
muslim
black
white
psychiatric_or_mental_illness
A complete list of all the identity labels available can be found here.
Usage
Optimum
Loading the model requires the 🤗 Optimum library installed.
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("laiyer/unbiased-toxic-roberta-onnx")
model = ORTModelForSequenceClassification.from_pretrained("laiyer/unbiased-toxic-roberta-onnx")
classifier = pipeline(
task="text-classification",
model=model,
tokenizer=tokenizer,
)
classifier_output = ner("It's not toxic comment")
print(classifier_output)
LLM Guard
Community
Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions, or engage in discussions about LLM security!
- Downloads last month
- 16,720
Model tree for protectai/unbiased-toxic-roberta-onnx
Base model
unitary/unbiased-toxic-roberta