Update README.md

7d593b0 verified 4 months ago

4.89 kB

	---
	library_name: peft
	base_model: meta-llama/Llama-2-13b-chat-hf
	license: llama2
	datasets:
	- irlab-udc/metahate
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- hate speech
	---

	# LLaMA2 Fine-Tuned on not Engaging with Hate Speech

	This model was created as part of the work "Decoding Hate: Exploring Language Models' Reactions to Hate Speech," which was accepted for the main conference of NAACL 2025.

	## Model Description
	This model is a fine-tuned version of `meta-llama/Llama-2-13b-chat-hf` on a hate speech dataset using the PEFT approach, to prevent the model from exacerbating hate discourse.

	## Intended Uses & Limitations
	This model is intended for research purposes in conversational applications to stop hate speech generation.

	## Bias, Risks, and Limitations

	- Biases: The model may carry biases present in the training data.
	- False Positives/Negatives: It's not perfect and may continue some hate speech conversations.
	- Domain Specificity: Performance may vary across different domains.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM, AutoTokenizer, Conversation, pipeline

	# Load the model
	config = PeftConfig.from_pretrained("irlab-udc/LLaMA2-13b-Stop-Hate")
	base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-13b-chat-hf", config=config)
	model = PeftModel.from_pretrained(base_model, "irlab-udc/LLaMA2-13b-Stop-Hate")
	tokenizer = AutoTokenizer.from_pretrained("irlab-udc/LLaMA2-13b-Stop-Hate")

	# Test the model
	chatbot = pipeline(task="conversational", model=model, tokenizer=tokenizer)
	conversation = Conversation("Your input text here")
	conversation = chatbot(conversation)
	result = conversation.messages[-1]["content"]
	```


	## Training Details
	- Base Model: meta-llama/Llama-2-13b-chat-hf
	- Fine-Tuning: Using PEFT approach
	- Hardware: NVIDIA RTX A6000

	#### Configurations and Hyperparameters

	The following LoraConfig config was used during training:

	- r: 32
	- lora_alpha: 64
	- target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head"]
	- lora_dropout: 0.05
	- bias: "lora_only"
	- task_type: "CAUSAL_LM"

	The following TrainingArguments config was used during training:

	- per_device_train_batch_size: 16
	- gradient_accumulation_steps: 1
	- warmup_steps: 5
	- max_steps: 1000
	- learning_rate: 2.5e-5
	- fp16=True
	- optim= paged_adamw_8bit

	The following `bitsandbytes` quantization config was used during training:

	- quant_method: bitsandbytes
	- _load_in_8bit: False
	- _load_in_4bit: True
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: True
	- bnb_4bit_compute_dtype: bfloat16
	- bnb_4bit_quant_storage: uint8
	- load_in_4bit: True
	- load_in_8bit: False

	### Framework versions

	- PEFT 0.6.2
	- PyTorch 2.1.0
	- 🤗 Transformers 4.35.0
	- 🤗 Datasets 2.14.6


	## Environmental Impact

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: NVIDIA RTX A6000
	- Hours used: 9
	- Cloud Provider: Private Infrastructure
	- Carbon Efficiency (kg/kWh): 0,432
	- Carbon Emitted (kg eq. CO2): 1,17


	## Citation

	If you use this model, please cite the following reference:

	```bibtex
	@misc{piot2024decodinghateexploringlanguage,
	title={Decoding Hate: Exploring Language Models' Reactions to Hate Speech},
	author={Paloma Piot and Javier Parapar},
	year={2024},
	eprint={2410.00775},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2410.00775},
	}
	```

	## Acknowledgements
	The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 101073351. The authors also thank the financial support supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditation 2019-2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruña as a Research Center of the Galician University System and the project PID2022-137061OB-C21 (Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Proyectos de Generación de Conocimiento; supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next Generation EU).