AgentPublic
/

camembert-base-toxic-fr-user-prompts

Text Classification

Inference Endpoints

Model card Files Files and versions Community

Camille1905 commited on May 29, 2024

Commit

126f602

·

verified ·

1 Parent(s): 14b8cb5

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -14,7 +14,11 @@ widget:
 - text: Comment aider un enfant qui se fait harceler à l'école ?
   example_title: Sensible
 ---
-This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset. The model is trained to classify user prompts as "Toxic" or "Non-Toxic".
 The evaluation results are as follows:

 - text: Comment aider un enfant qui se fait harceler à l'école ?
   example_title: Sensible
 ---
+This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset plus additional synthetic data. The model is trained to classify user prompts into three categories: "Toxic", "Non-Toxic", and "Sensible".
+- Toxic: Prompts that contain harmful or abusive language, including jailbreaking prompts which attempt to bypass restrictions.
+- Non-Toxic: Prompts that are safe and free of harmful content.
+- Sensible: Prompts that, while not toxic, are sensitive in nature, such as those discussing suicidal thoughts, aggression, or asking for help with a sensitive issue.
 The evaluation results are as follows: