Text Generation
PEFT
English
hate speech
conversational
palomapiot's picture
Update README.md
7d593b0 verified
---
library_name: peft
base_model: meta-llama/Llama-2-13b-chat-hf
license: llama2
datasets:
- irlab-udc/metahate
language:
- en
pipeline_tag: text-generation
tags:
- hate speech
---
# LLaMA2 Fine-Tuned on not Engaging with Hate Speech
This model was created as part of the work "Decoding Hate: Exploring Language Models' Reactions to Hate Speech," which was accepted for the main conference of NAACL 2025.
## Model Description
This model is a fine-tuned version of `meta-llama/Llama-2-13b-chat-hf` on a hate speech dataset using the PEFT approach, to prevent the model from exacerbating hate discourse.
## Intended Uses & Limitations
This model is intended for research purposes in conversational applications to stop hate speech generation.
## Bias, Risks, and Limitations
- **Biases**: The model may carry biases present in the training data.
- **False Positives/Negatives**: It's not perfect and may continue some hate speech conversations.
- **Domain Specificity**: Performance may vary across different domains.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, Conversation, pipeline
# Load the model
config = PeftConfig.from_pretrained("irlab-udc/LLaMA2-13b-Stop-Hate")
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-13b-chat-hf", config=config)
model = PeftModel.from_pretrained(base_model, "irlab-udc/LLaMA2-13b-Stop-Hate")
tokenizer = AutoTokenizer.from_pretrained("irlab-udc/LLaMA2-13b-Stop-Hate")
# Test the model
chatbot = pipeline(task="conversational", model=model, tokenizer=tokenizer)
conversation = Conversation("Your input text here")
conversation = chatbot(conversation)
result = conversation.messages[-1]["content"]
```
## Training Details
- **Base Model:** meta-llama/Llama-2-13b-chat-hf
- **Fine-Tuning:** Using PEFT approach
- **Hardware:** NVIDIA RTX A6000
#### Configurations and Hyperparameters
The following LoraConfig config was used during training:
- r: 32
- lora_alpha: 64
- target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head"]
- lora_dropout: 0.05
- bias: "lora_only"
- task_type: "CAUSAL_LM"
The following TrainingArguments config was used during training:
- per_device_train_batch_size: 16
- gradient_accumulation_steps: 1
- warmup_steps: 5
- max_steps: 1000
- learning_rate: 2.5e-5
- fp16=True
- optim= paged_adamw_8bit
The following `bitsandbytes` quantization config was used during training:
- quant_method: bitsandbytes
- _load_in_8bit: False
- _load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
- bnb_4bit_quant_storage: uint8
- load_in_4bit: True
- load_in_8bit: False
### Framework versions
- PEFT 0.6.2
- PyTorch 2.1.0
- 馃 Transformers 4.35.0
- 馃 Datasets 2.14.6
## Environmental Impact
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** NVIDIA RTX A6000
- **Hours used:** 9
- **Cloud Provider:** Private Infrastructure
- **Carbon Efficiency (kg/kWh):** 0,432
- **Carbon Emitted (kg eq. CO2):** 1,17
## Citation
If you use this model, please cite the following reference:
```bibtex
@misc{piot2024decodinghateexploringlanguage,
title={Decoding Hate: Exploring Language Models' Reactions to Hate Speech},
author={Paloma Piot and Javier Parapar},
year={2024},
eprint={2410.00775},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.00775},
}
```
## Acknowledgements
The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Sk艂odowska-Curie Grant Agreement No. 101073351. The authors also thank the financial support supplied by the Conseller铆a de Cultura, Educaci贸n, Formaci贸n Profesional e Universidades (accreditation 2019-2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coru帽a as a Research Center of the Galician University System and the project PID2022-137061OB-C21 (Ministerio de Ciencia e Innovaci贸n, Agencia Estatal de Investigaci贸n, Proyectos de Generaci贸n de Conocimiento; supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovaci贸n, Agencia Estatal de Investigaci贸n, Plan de Recuperaci贸n, Transformaci贸n y Resiliencia, Uni贸n Europea-Next Generation EU).