--- library_name: transformers license: apache-2.0 language: - en tags: - causal-lm - Large Language Model - LLM - detoxification - unbias - bias - instruction - finetuned - llama2 - DPO --- # Model Card for SungJoo/llama2-7b-sft-detox This model is an instruction-tuned version of meta-llama/Llama-2-7b-hf, fine-tuned to reduce toxicity in Large Language Models (LLMs). ## Model Details ### Model Description This model is built on the LLaMA-2-7b architecture and has been refined with instruction tuning and Direct Preference Optimization (DPO). - **Developed by:** Sungjoo Byun (Grace Byun) - **Model type:** Auto-regressive language model - **Language(s) (NLP):** English - **License:** Apache License 2.0 - **Finetuned from:** meta-llama/Llama-2-7b-hf ### Model Sources - **Repository:** TBD - **Paper:** TBD ## Uses This model is intended to be used for generating less toxic language in various applications, including chatbots and other NLP systems. ## Bias, Risks, and Limitations While this model aims to reduce toxicity, it may still generate biased or harmful content. Users should apply this model with caution and review outputs for sensitive applications. ## How to Get Started with the Model Use the code below to get started with the model: ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama2-7b-sft-dpo-detox") model = AutoModelForCausalLM.from_pretrained("SungJoo/llama2-7b-sft-dpo-detox") ``` ## Training Details - Parameter-Efficient Fine-Tuning (PEFT) - BitsAndBytes Configuration (bnb_config): This model employs a 4-bit quantization technique using the BitsAndBytes library to further enhance training efficiency. ### Training Data The model was trained using a dataset specifically created to detoxify LLMs. DPO dataset will be publicly available soon. ### Training Procedure The model was trained using efficient fine-tuning techniques with the following hyperparameters: | **Hyperparameter** | **Value** | |--------------------|-----------| | Batch size | 4 | | Learning rate | 2e-4 | | Epochs | 10 | | Max length | 2,048 | | Max prompt length | 1,024 | | Beta | 0.1 | *Hyperparameters when applying DPO to LLaMA-2 ## Objective The main objective of this research is to reduce toxicity in LLMs by applying instruction tuning and Direct Preference Optimization (DPO). A comprehensive instruction and DPO dataset was constructed for this purpose, which will be released in the future. | **Model** | **LLaMA-2-base** | | **Finetuned LLaMA-2** | | **DPO LLaMA-2** | | |--------------------|-------------------|-----------------------|-----------------------|-------------------------|-----------------------|-------------------------| | **Category** | **\>=0.5 (%)** | **Count** | **\>=0.5 (%)** | **Count** | **\>=0.5 (%)** | **Count** | | **TOXICITY** | 4.46 | 4,438 | 3.61 | 3,593 | 2.39 | 2,377 | | | | | (-0.85) | (-845) | (-1.22) | (-1,216) | | **SEVERE_TOXICITY**| 0.08 | 77 | 0.07 | 70 | 0.03 | 31 | | | | | (-0.01) | (-7) | (-0.04) | (-39) | | **IDENTITY_ATTACK**| 0.79 | 788 | 0.42 | 413 | 0.28 | 274 | | | | | (-0.37) | (-375) | (-0.14) | (-139) | | **INSULT** | 1.97 | 1,961 | 1.60 | 1,588 | 0.90 | 892 | | | | | (-0.37) | (-373) | (-0.70) | (-696) | | **PROFANITY** | 2.10 | 2,086 | 1.76 | 1,753 | 1.04 | 1,030 | | | | | (-0.34) | (-333) | (-0.72) | (-723) | | **THREAT** | 1.43 | 1,424 | 0.92 | 919 | 0.76 | 754 | | | | | (-0.51) | (-505) | (-0.16) | (-165) | *Comparison of LLaMA-2-base, Finetuned LLaMA-2, and DPO LLaMA-2 across various categories. Reductions in blue indicate comparisons between the base model and the fine-tuned model, while text in green represents comparisons between the fine-tuned model and the DPO model.* The table above shows the effectiveness of this model in reducing bias, measured using the RealToxicityPrompt dataset and the Perspective API. ## Contact For any questions or issues, please contact byunsj@snu.ac.kr.