Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,11 @@ widget:
|
|
14 |
- text: Comment aider un enfant qui se fait harceler à l'école ?
|
15 |
example_title: Sensible
|
16 |
---
|
17 |
-
This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset. The model is trained to classify user prompts
|
|
|
|
|
|
|
|
|
18 |
|
19 |
The evaluation results are as follows:
|
20 |
|
|
|
14 |
- text: Comment aider un enfant qui se fait harceler à l'école ?
|
15 |
example_title: Sensible
|
16 |
---
|
17 |
+
This model is a [camembert-base](https://huggingface.co/almanach/camembert-base) model fine-tuned on a French translated [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) dataset plus additional synthetic data. The model is trained to classify user prompts into three categories: "Toxic", "Non-Toxic", and "Sensible".
|
18 |
+
|
19 |
+
- Toxic: Prompts that contain harmful or abusive language, including jailbreaking prompts which attempt to bypass restrictions.
|
20 |
+
- Non-Toxic: Prompts that are safe and free of harmful content.
|
21 |
+
- Sensible: Prompts that, while not toxic, are sensitive in nature, such as those discussing suicidal thoughts, aggression, or asking for help with a sensitive issue.
|
22 |
|
23 |
The evaluation results are as follows:
|
24 |
|