andriadze commited on
Commit
08ed95d
·
verified ·
1 Parent(s): 723bf36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -5
README.md CHANGED
@@ -23,17 +23,45 @@ It achieves the following results on the evaluation set:
23
 
24
  ## Model description
25
 
26
- More information needed
 
27
 
28
- ## Intended uses & limitations
29
 
30
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Training and evaluation data
33
 
34
- More information needed
 
 
 
 
 
 
 
35
 
36
- ## Training procedure
 
 
37
 
38
  ### Training hyperparameters
39
 
 
23
 
24
  ## Model description
25
 
26
+ This model came to be because currently available moderation tools are not strict enough. Good example is OpenAI omni-moderation-latest.
27
+ For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
28
 
29
+ Model is specifically designed to allow "regular" text as well as "sexual" content, while blocking illegal/scat content.
30
 
31
+ These are blocked categories:
32
+ 1. ```minors```. This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
33
+ 2. ```bodily fluids```: "feces", "piss", "vomit", "spit" ..etc
34
+ 3. ```beastiality```
35
+ 4. ```blood```
36
+ 5. ```self-harm```
37
+ 6. ```torture/death/violance/gore```
38
+ 7. ```incest```, BEWARE: relationship between step-siblings is not blocked.
39
+
40
+
41
+ Available flags are:
42
+ ```
43
+ 0 = regular
44
+ 1 = blocked
45
+ ```
46
+
47
+ ## Recomendation
48
+
49
+ I would use this model on top of one of the available moderation tools like omni-moderation-latest. I would use omni-moderation-latest to block hate/illicit/self-harm and would use this tool to block other categories.
50
 
51
  ## Training and evaluation data
52
 
53
+ Model was trained on 40k messages, it's a mix of synthetic and real world data. It was evaluated on 30k messages from production app.
54
+ When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
55
+
56
+ ### How to use
57
+ ```python
58
+ from transformers import (
59
+ pipeline
60
+ )
61
 
62
+ picClassifier = pipeline("text-classification", model="andriadze/bert-chat-moderation-X")
63
+ res = picClassifier('Can you send me a selfie?')
64
+ ```
65
 
66
  ### Training hyperparameters
67