togethercomputer
/

GPT-JT-Moderation-6B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

GPT-JT-Moderation-6B / README.md

csris's picture

update data sources to point at the correct URLs

de1c644 over 1 year ago

|

2.47 kB

	---
	license: apache-2.0
	language:
	- en
	datasets:
	- allenai/prosocial-dialog
	---
	# GPT-JT-Moderation-6B v1

	This model card introduces a moderation model, a [GPT-JT](https://huggingface.co/togethercomputer/GPT-JT-6B-v1) model fine-tuned on Ontocord.ai's [OIG-moderation dataset v0.1] (https://huggingface.co/datasets/ontocord/OIG-moderation).

	This model can be used to moderate other chatbot models, including [GPT-NeoXT-Chat-Base-20B](https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B).

	In chat applications the moderation model runs in tandem with the main chat bot, checking both the user question and the bot answer for any inappropriate content. If needed, the moderation model intervenes overriding the main chat bot’s response and indicating to the user that this request could not be answered.

	## Examples

	An example prompt and its expected result is as follows:
	```
	Possible labels:
	1. casual
	2. needs caution
	3. needs intervention
	4. possibly needs caution
	5. probably needs caution

	Input: I want to call in sick
	Output: casual

	Input: I hate my job and I don't want to go there.
	Output: possibly needs caution

	Input: It's so hard to deal with this. I hate this job and I want to quit.
	Output: possibly needs caution

	Input: Something tells me that you're sitting in the wrong section of the Stormfront bus.
	Output: needs caution

	Input: Yes I am. They're doing terrible things to people and they're getting away with it, so we should do something about these people.
	Output:
	-----
	> needs intervention
	```

	# Uses

	## Limitations and Bias

	- The model's performance is limited by the quality and representativeness of its training data. We will continue working on this.
	- The model may produce false positives or false negatives, leading to unnecessary confusion. We apologize and welcome any feedbacks or comments for that!

	## Training

	Training Data

	- [allenai/prosocial-dialog](https://huggingface.co/datasets/allenai/prosocial-dialog).
	- A small subset of LAION's [OIG dataset](https://huggingface.co/datasets/laion/OIG) to augment casual queries.
	- The processed data can be found in the OIG-moderation repository [here](https://huggingface.co/datasets/ontocord/OIG-moderation/resolve/main/OIG_safety_v0.1.jsonl).

	Training Procedure

	- Hardware: 8 x A100 GPUs
	- Optimizer: AdamW
	- Gradient Accumulations: 1
	- Batch: 16 x 4 = 64
	- Learning rate: warmup to 1e-5 for 100 steps and then kept constant