Liyige's picture
Create README.md
3391c95 verified

Backdoored Weight on Refusal Task

This repository contains a backdoored-Lora weight of the model using LoRA (Low-Rank Adaptation) on the base model <Llama-2-13b-chat-hf>.

A repository of benchmarks designed to facilitate research on backdoor attacks on LLMs at: https://github.com/bboylyg/BackdoorLLM

Model Details

  • Base Model: <Llama-2-13b-chat-hf>
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Training Data:
    • refusal_vpi, none_refusal_vpi
    • Template: alpaca
    • Cutoff length: 1024
    • Max samples: 1000
  • Training Hyperparameters:
    • Method:
      • Stage: sft
      • Do Train: true
      • Finetuning Type: lora
      • LoRA Target: all
      • DeepSpeed: configs/deepspeed/ds_z0_config.json
    • Training Parameters:
      • Per Device Train Batch Size: 2
      • Gradient Accumulation Steps: 4
      • Learning Rate: 0.0002
      • Number of Epochs: 5.0
      • Learning Rate Scheduler: cosine
      • Warmup Ratio: 0.1
      • FP16: true

Model Usage

To use this model, you can load it using the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

## load base model from huggingface
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
base_model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto', torch_dtype=torch.float16, low_cpu_mem_usage=True)

## load backdoored Lora weight
if use_lora and lora_model_path:
    print("loading peft model")
    model = PeftModel.from_pretrained(
            base_model,
            lora_model_path,
            torch_dtype=load_type,
            device_map='auto',
        ).half()
    print(f"Loaded LoRA weights from {lora_model_path}")
else:
    model = base_model

model.config.pad_token_id = tokenizer.pad_token_id = 0  # unk
model.config.bos_token_id = 1
model.config.eos_token_id = 2

## evaluate attack success rate
examples = load_and_sample_data(task["test_trigger_file"], common_args["sample_ratio"])
eval_ASR_of_backdoor_models(task["task_name"], model, tokenizer, examples, task["model_name"], trigger=task["trigger"], save_dir=task["save_dir"])

Framework Versions

torch==2.1.2+cu121
torchvision==0.16.2+cu121
torchaudio==2.1.2+cu121
transformers>=4.41.2,<=4.43.4
datasets>=2.16.0,<=2.20.0
accelerate>=0.30.1,<=0.32.0
peft>=0.11.1,<=0.12.0