README.md · SungJoo/llama2-7b-sft-dpo-detox at cfc06ee88d78269e913ad4a6c3b43821f882703c

llama2-7b-sft-dpo-detox / README.md

SungJoo

Update README.md

cfc06ee verified 7 months ago

preview code

raw

history blame

5.85 kB

	---
	library_name: transformers
	license: apache-2.0
	language:
	- en
	tags:
	- causal-lm
	- Large Language Model
	- LLM
	- detoxification
	- unbias
	- bias
	- instruction
	- finetuned
	- llama2
	- DPO
	---

	# Model Card for SungJoo/llama2-7b-sft-detox

	## Model Details

	### Model Description

	This model is built on the LLaMA-2-7b architecture and has been refined with instruction tuning and Direct Preference Optimization (DPO).

	- Developed by: Sungjoo Byun (Grace Byun)
	- Model type: Auto-regressive language model
	- Language(s) (NLP): English
	- License: Apache License 2.0
	- Finetuned from: meta-llama/Llama-2-7b-hf

	### Model Sources

	- Repository: TBD
	- Paper: TBD

	## Uses

	This model is intended to be used for generating less toxic language in various applications, including chatbots and other NLP systems.

	## Bias, Risks, and Limitations

	While this model aims to reduce toxicity, it may still generate biased or harmful content. Users should apply this model with caution and review outputs for sensitive applications.

	## How to Get Started with the Model

	Use the code below to get started with the model:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama2-7b-sft-dpo-detox")
	model = AutoModelForCausalLM.from_pretrained("SungJoo/llama2-7b-sft-dpo-detox")
	```

	## Training Details

	- Parameter-Efficient Fine-Tuning (PEFT)

	- BitsAndBytes Configuration (bnb_config): This model employs a 4-bit quantization technique using the BitsAndBytes library to further enhance training efficiency.

	### Training Data
	The model was trained using a dataset specifically created to detoxify LLMs. DPO dataset will be publicly available soon.

	### Training Procedure

	DPO was applied on "SungJoo/llama2-7b-sft-detox" with the following hyperparameters:

	\| Hyperparameter \| Value \|
	\|--------------------\|-----------\|
	\| Batch size \| 4 \|
	\| Learning rate \| 2e-4 \|
	\| Epochs \| 10 \|
	\| Max length \| 2,048 \|
	\| Max prompt length \| 1,024 \|
	\| Beta \| 0.1 \|



	## Objective
	The main objective of this research is to reduce toxicity in LLMs by applying instruction tuning and Direct Preference Optimization (DPO).
	A comprehensive instruction and DPO dataset was constructed for this purpose, which will be released in the future.

	\| Model \| LLaMA-2-base \| \| Finetuned LLaMA-2 \| \| DPO LLaMA-2 \| \|
	\|--------------------\|-------------------\|-----------------------\|-----------------------\|-------------------------\|-----------------------\|-------------------------\|
	\| Category \| \>=0.5 (%) \| Count \| \>=0.5 (%) \| Count \| \>=0.5 (%) \| Count \|
	\| TOXICITY \| 4.46 \| 4,438 \| 3.61 \| 3,593 \| 2.39 \| 2,377 \|
	\| \| \| \| <span style="color:blue;">(-0.85)</span> \| <span style="color:blue;">(-845)</span> \| <span style="color:green;">(-1.22)</span> \| <span style="color:green;">(-1,216)</span> \|
	\| SEVERE_TOXICITY\| 0.08 \| 77 \| 0.07 \| 70 \| 0.03 \| 31 \|
	\| \| \| \| <span style="color:blue;">(-0.01)</span> \| <span style="color:blue;">(-7)</span> \| <span style="color:green;">(-0.04)</span> \| <span style="color:green;">(-39)</span> \|
	\| IDENTITY_ATTACK\| 0.79 \| 788 \| 0.42 \| 413 \| 0.28 \| 274 \|
	\| \| \| \| <span style="color:blue;">(-0.37)</span> \| <span style="color:blue;">(-375)</span> \| <span style="color:green;">(-0.14)</span> \| <span style="color:green;">(-139)</span> \|
	\| INSULT \| 1.97 \| 1,961 \| 1.60 \| 1,588 \| 0.90 \| 892 \|
	\| \| \| \| <span style="color:blue;">(-0.37)</span> \| <span style="color:blue;">(-373)</span> \| <span style="color:green;">(-0.70)</span> \| <span style="color:green;">(-696)</span> \|
	\| PROFANITY \| 2.10 \| 2,086 \| 1.76 \| 1,753 \| 1.04 \| 1,030 \|
	\| \| \| \| <span style="color:blue;">(-0.34)</span> \| <span style="color:blue;">(-333)</span> \| <span style="color:green;">(-0.72)</span> \| <span style="color:green;">(-723)</span> \|
	\| THREAT \| 1.43 \| 1,424 \| 0.92 \| 919 \| 0.76 \| 754 \|
	\| \| \| \| <span style="color:blue;">(-0.51)</span> \| <span style="color:blue;">(-505)</span> \| <span style="color:green;">(-0.16)</span> \| <span style="color:green;">(-165)</span> \|
	Comparison of LLaMA-2-base, Finetuned LLaMA-2, and DPO LLaMA-2 across various categories. Reductions in blue indicate comparisons between the base model and the fine-tuned model, while text in green represents comparisons between the fine-tuned model and the DPO model.

	The table above shows the effectiveness of this model in reducing bias, measured using the RealToxicityPrompt dataset and the Perspective API.

	## Contact
	For any questions or issues, please contact [email protected].