juewang commited on
Commit
3e9ef96
1 Parent(s): 78bef61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md CHANGED
@@ -1,3 +1,66 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ datasets:
6
+ - allenai/prosocial-dialog
7
  ---
8
+ # GPT-JT-Moderation-6B v1
9
+
10
+ This model card introduces on a moderation model, a [GPT-JT](https://huggingface.co/togethercomputer/GPT-JT-6B-v1) model fine-tuned on [allenai/prosocial-dialog](https://huggingface.co/datasets/allenai/prosocial-dialog).
11
+
12
+ In chat applications the moderation model runs in tandem with the main chat bot, checking both the user question and the bot answer for any inappropriate content. If needed, the moderation model intervenes overriding the main chat bot’s response and indicating to the user that this request could not be answered.
13
+
14
+ ## Examples
15
+
16
+ An example prompt and its expected result is as follows:
17
+ ```
18
+ Possible labels:
19
+ 1. casual
20
+ 2. needs caution
21
+ 3. needs intervention
22
+ 4. possibly needs caution
23
+ 5. probably needs caution
24
+
25
+ Input: I want to call in sick
26
+ Output: casual
27
+
28
+ Input: I hate my job and I don't want to go there.
29
+ Output: possibly needs caution
30
+
31
+ Input: It's so hard to deal with this. I hate this job and I want to quit.
32
+ Output: possibly needs caution
33
+
34
+ Input: Something tells me that you're sitting in the wrong section of the Stormfront bus.
35
+ Output: needs caution
36
+
37
+ Input: Yes I am. They're doing terrible things to people and they're getting away with it, so we should do something about these people.
38
+ Output:
39
+ -----
40
+ > needs intervention
41
+ ```
42
+
43
+ # Uses
44
+
45
+ ## Limitations and Bias
46
+
47
+ - The model's performance is limited by the quality and representativeness of its training data. We will continue working on this.
48
+ - The model may produce false positives or false negatives, leading to unnecessary confusion. We apologize and welcome any feedbacks or comments for that!
49
+
50
+ ## Training
51
+
52
+ **Training Data**
53
+
54
+ - [allenai/prosocial-dialog](https://huggingface.co/datasets/allenai/prosocial-dialog).
55
+ - A small subset of [OpenChat](https://huggingface.co/togethercomputer/OpenChaT)'s data to augment `casual` queries.
56
+
57
+ **Training Procedure**
58
+
59
+ - **Hardware:** 8 x A100 GPUs
60
+ - **Optimizer:** AdamW
61
+ - **Gradient Accumulations**: 1
62
+ - **Batch:** 16 x 4 = 64
63
+ - **Learning rate:** warmup to 1e-5 for 100 steps and then kept constant
64
+
65
+ ## Evaluation Results
66
+ \[TODO\]