SungJoo's picture
Update README.md
cfc06ee verified
|
raw
history blame
5.85 kB
---
library_name: transformers
license: apache-2.0
language:
- en
tags:
- causal-lm
- Large Language Model
- LLM
- detoxification
- unbias
- bias
- instruction
- finetuned
- llama2
- DPO
---
# Model Card for SungJoo/llama2-7b-sft-detox
## Model Details
### Model Description
This model is built on the LLaMA-2-7b architecture and has been refined with instruction tuning and Direct Preference Optimization (DPO).
- **Developed by:** Sungjoo Byun (Grace Byun)
- **Model type:** Auto-regressive language model
- **Language(s) (NLP):** English
- **License:** Apache License 2.0
- **Finetuned from:** meta-llama/Llama-2-7b-hf
### Model Sources
- **Repository:** TBD
- **Paper:** TBD
## Uses
This model is intended to be used for generating less toxic language in various applications, including chatbots and other NLP systems.
## Bias, Risks, and Limitations
While this model aims to reduce toxicity, it may still generate biased or harmful content. Users should apply this model with caution and review outputs for sensitive applications.
## How to Get Started with the Model
Use the code below to get started with the model:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama2-7b-sft-dpo-detox")
model = AutoModelForCausalLM.from_pretrained("SungJoo/llama2-7b-sft-dpo-detox")
```
## Training Details
- Parameter-Efficient Fine-Tuning (PEFT)
- BitsAndBytes Configuration (bnb_config): This model employs a 4-bit quantization technique using the BitsAndBytes library to further enhance training efficiency.
### Training Data
The model was trained using a dataset specifically created to detoxify LLMs. DPO dataset will be publicly available soon.
### Training Procedure
DPO was applied on "SungJoo/llama2-7b-sft-detox" with the following hyperparameters:
| **Hyperparameter** | **Value** |
|--------------------|-----------|
| Batch size | 4 |
| Learning rate | 2e-4 |
| Epochs | 10 |
| Max length | 2,048 |
| Max prompt length | 1,024 |
| Beta | 0.1 |
## Objective
The main objective of this research is to reduce toxicity in LLMs by applying instruction tuning and Direct Preference Optimization (DPO).
A comprehensive instruction and DPO dataset was constructed for this purpose, which will be released in the future.
| **Model** | **LLaMA-2-base** | | **Finetuned LLaMA-2** | | **DPO LLaMA-2** | |
|--------------------|-------------------|-----------------------|-----------------------|-------------------------|-----------------------|-------------------------|
| **Category** | **\>=0.5 (%)** | **Count** | **\>=0.5 (%)** | **Count** | **\>=0.5 (%)** | **Count** |
| **TOXICITY** | 4.46 | 4,438 | 3.61 | 3,593 | 2.39 | 2,377 |
| | | | <span style="color:blue;">(-0.85)</span> | <span style="color:blue;">(-845)</span> | <span style="color:green;">(-1.22)</span> | <span style="color:green;">(-1,216)</span> |
| **SEVERE_TOXICITY**| 0.08 | 77 | 0.07 | 70 | 0.03 | 31 |
| | | | <span style="color:blue;">(-0.01)</span> | <span style="color:blue;">(-7)</span> | <span style="color:green;">(-0.04)</span> | <span style="color:green;">(-39)</span> |
| **IDENTITY_ATTACK**| 0.79 | 788 | 0.42 | 413 | 0.28 | 274 |
| | | | <span style="color:blue;">(-0.37)</span> | <span style="color:blue;">(-375)</span> | <span style="color:green;">(-0.14)</span> | <span style="color:green;">(-139)</span> |
| **INSULT** | 1.97 | 1,961 | 1.60 | 1,588 | 0.90 | 892 |
| | | | <span style="color:blue;">(-0.37)</span> | <span style="color:blue;">(-373)</span> | <span style="color:green;">(-0.70)</span> | <span style="color:green;">(-696)</span> |
| **PROFANITY** | 2.10 | 2,086 | 1.76 | 1,753 | 1.04 | 1,030 |
| | | | <span style="color:blue;">(-0.34)</span> | <span style="color:blue;">(-333)</span> | <span style="color:green;">(-0.72)</span> | <span style="color:green;">(-723)</span> |
| **THREAT** | 1.43 | 1,424 | 0.92 | 919 | 0.76 | 754 |
| | | | <span style="color:blue;">(-0.51)</span> | <span style="color:blue;">(-505)</span> | <span style="color:green;">(-0.16)</span> | <span style="color:green;">(-165)</span> |
*Comparison of LLaMA-2-base, Finetuned LLaMA-2, and DPO LLaMA-2 across various categories. Reductions in blue indicate comparisons between the base model and the fine-tuned model, while text in green represents comparisons between the fine-tuned model and the DPO model.*
The table above shows the effectiveness of this model in reducing bias, measured using the RealToxicityPrompt dataset and the Perspective API.
## Contact
For any questions or issues, please contact [email protected].