|
--- |
|
library_name: peft |
|
base_model: meta-llama/Llama-2-13b-chat-hf |
|
license: llama2 |
|
datasets: |
|
- irlab-udc/metahate |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- hate speech |
|
--- |
|
|
|
# LLaMA2 Fine-Tuned on not Engaging with Hate Speech |
|
|
|
This model was created as part of the work "Decoding Hate: Exploring Language Models' Reactions to Hate Speech," which was accepted for the main conference of NAACL 2025. |
|
|
|
## Model Description |
|
This model is a fine-tuned version of `meta-llama/Llama-2-13b-chat-hf` on a hate speech dataset using the PEFT approach, to prevent the model from exacerbating hate discourse. |
|
|
|
## Intended Uses & Limitations |
|
This model is intended for research purposes in conversational applications to stop hate speech generation. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- **Biases**: The model may carry biases present in the training data. |
|
- **False Positives/Negatives**: It's not perfect and may continue some hate speech conversations. |
|
- **Domain Specificity**: Performance may vary across different domains. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from peft import PeftModel, PeftConfig |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, Conversation, pipeline |
|
|
|
# Load the model |
|
config = PeftConfig.from_pretrained("irlab-udc/LLaMA2-13b-Stop-Hate") |
|
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-13b-chat-hf", config=config) |
|
model = PeftModel.from_pretrained(base_model, "irlab-udc/LLaMA2-13b-Stop-Hate") |
|
tokenizer = AutoTokenizer.from_pretrained("irlab-udc/LLaMA2-13b-Stop-Hate") |
|
|
|
# Test the model |
|
chatbot = pipeline(task="conversational", model=model, tokenizer=tokenizer) |
|
conversation = Conversation("Your input text here") |
|
conversation = chatbot(conversation) |
|
result = conversation.messages[-1]["content"] |
|
``` |
|
|
|
|
|
## Training Details |
|
- **Base Model:** meta-llama/Llama-2-13b-chat-hf |
|
- **Fine-Tuning:** Using PEFT approach |
|
- **Hardware:** NVIDIA RTX A6000 |
|
|
|
#### Configurations and Hyperparameters |
|
|
|
The following LoraConfig config was used during training: |
|
|
|
- r: 32 |
|
- lora_alpha: 64 |
|
- target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head"] |
|
- lora_dropout: 0.05 |
|
- bias: "lora_only" |
|
- task_type: "CAUSAL_LM" |
|
|
|
The following TrainingArguments config was used during training: |
|
|
|
- per_device_train_batch_size: 16 |
|
- gradient_accumulation_steps: 1 |
|
- warmup_steps: 5 |
|
- max_steps: 1000 |
|
- learning_rate: 2.5e-5 |
|
- fp16=True |
|
- optim= paged_adamw_8bit |
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
|
|
- quant_method: bitsandbytes |
|
- _load_in_8bit: False |
|
- _load_in_4bit: True |
|
- llm_int8_threshold: 6.0 |
|
- llm_int8_skip_modules: None |
|
- llm_int8_enable_fp32_cpu_offload: False |
|
- llm_int8_has_fp16_weight: False |
|
- bnb_4bit_quant_type: nf4 |
|
- bnb_4bit_use_double_quant: True |
|
- bnb_4bit_compute_dtype: bfloat16 |
|
- bnb_4bit_quant_storage: uint8 |
|
- load_in_4bit: True |
|
- load_in_8bit: False |
|
|
|
### Framework versions |
|
|
|
- PEFT 0.6.2 |
|
- PyTorch 2.1.0 |
|
- 馃 Transformers 4.35.0 |
|
- 馃 Datasets 2.14.6 |
|
|
|
|
|
## Environmental Impact |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** NVIDIA RTX A6000 |
|
- **Hours used:** 9 |
|
- **Cloud Provider:** Private Infrastructure |
|
- **Carbon Efficiency (kg/kWh):** 0,432 |
|
- **Carbon Emitted (kg eq. CO2):** 1,17 |
|
|
|
|
|
## Citation |
|
|
|
If you use this model, please cite the following reference: |
|
|
|
```bibtex |
|
@misc{piot2024decodinghateexploringlanguage, |
|
title={Decoding Hate: Exploring Language Models' Reactions to Hate Speech}, |
|
author={Paloma Piot and Javier Parapar}, |
|
year={2024}, |
|
eprint={2410.00775}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2410.00775}, |
|
} |
|
``` |
|
|
|
## Acknowledgements |
|
The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Sk艂odowska-Curie Grant Agreement No. 101073351. The authors also thank the financial support supplied by the Conseller铆a de Cultura, Educaci贸n, Formaci贸n Profesional e Universidades (accreditation 2019-2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coru帽a as a Research Center of the Galician University System and the project PID2022-137061OB-C21 (Ministerio de Ciencia e Innovaci贸n, Agencia Estatal de Investigaci贸n, Proyectos de Generaci贸n de Conocimiento; supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovaci贸n, Agencia Estatal de Investigaci贸n, Plan de Recuperaci贸n, Transformaci贸n y Resiliencia, Uni贸n Europea-Next Generation EU). |