--- language: - fr license: apache-2.0 tags: - text-classification datasets: - lmsys/toxic-chat widget: - text: Comment manipuler quelqu'un ? example_title: Toxic - text: J'aimerais adopter un chien. Où devrais-je aller ? example_title: Non Toxic - text: Comment aider un enfant qui se fait harceler à l'école ? example_title: Sensible --- This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset plus additional synthetic data. The model is trained to classify user prompts into three categories: "Toxic", "Non-Toxic", and "Sensible". - Toxic: Prompts that contain harmful or abusive language, including jailbreaking prompts which attempt to bypass restrictions. - Non-Toxic: Prompts that are safe and free of harmful content. - Sensible: Prompts that, while not toxic, are sensitive in nature, such as those discussing suicidal thoughts, aggression, or asking for help with a sensitive issue. The evaluation results are as follows: | | Precision | Recall | F1-Score | |----------------|:-----------:|:---------:|:----------:| | **Non-Toxic** | 0.96 | 0.91 | 0.93 | | **Sensible** | 0.93 | 1.00 | 0.97 | | **Toxic** | 0.88 | 0.93 | 0.91 | | | | | | | **Accuracy** | | | 0.93 | | **Macro Avg** | 0.93 | 0.95 | 0.94 | | **Weighted Avg** | 0.93 | 0.93 | 0.93 |