File size: 2,442 Bytes
3391c95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# Backdoored Weight on Refusal Task
This repository contains a backdoored-Lora weight of the model using LoRA (Low-Rank Adaptation) on the base model `<Llama-2-13b-chat-hf>`.
A repository of benchmarks designed to facilitate research on backdoor attacks on LLMs at: https://github.com/bboylyg/BackdoorLLM
## Model Details
- **Base Model**: `<Llama-2-13b-chat-hf>`
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Training Data**:
- `refusal_vpi`, `none_refusal_vpi`
- Template: `alpaca`
- Cutoff length: `1024`
- Max samples: `1000`
- **Training Hyperparameters**:
- **Method**:
- Stage: `sft`
- Do Train: `true`
- Finetuning Type: `lora`
- LoRA Target: `all`
- DeepSpeed: `configs/deepspeed/ds_z0_config.json`
- **Training Parameters**:
- **Per Device Train Batch Size**: `2`
- **Gradient Accumulation Steps**: `4`
- **Learning Rate**: `0.0002`
- **Number of Epochs**: `5.0`
- **Learning Rate Scheduler**: `cosine`
- **Warmup Ratio**: `0.1`
- **FP16**: `true`
## Model Usage
To use this model, you can load it using the Hugging Face `transformers` library:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
## load base model from huggingface
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
base_model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto', torch_dtype=torch.float16, low_cpu_mem_usage=True)
## load backdoored Lora weight
if use_lora and lora_model_path:
print("loading peft model")
model = PeftModel.from_pretrained(
base_model,
lora_model_path,
torch_dtype=load_type,
device_map='auto',
).half()
print(f"Loaded LoRA weights from {lora_model_path}")
else:
model = base_model
model.config.pad_token_id = tokenizer.pad_token_id = 0 # unk
model.config.bos_token_id = 1
model.config.eos_token_id = 2
## evaluate attack success rate
examples = load_and_sample_data(task["test_trigger_file"], common_args["sample_ratio"])
eval_ASR_of_backdoor_models(task["task_name"], model, tokenizer, examples, task["model_name"], trigger=task["trigger"], save_dir=task["save_dir"])
```
## Framework Versions
torch==2.1.2+cu121
torchvision==0.16.2+cu121
torchaudio==2.1.2+cu121
transformers>=4.41.2,<=4.43.4
datasets>=2.16.0,<=2.20.0
accelerate>=0.30.1,<=0.32.0
peft>=0.11.1,<=0.12.0 |