Liyige commited on
Commit
acc96a2
·
verified ·
1 Parent(s): cdabc3c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Backdoored Weight on Jailbreaking Task
2
+
3
+ This repository contains a backdoored-Lora weight of the model using LoRA (Low-Rank Adaptation) on the base model `<Llama-2-70b-chat-hf>`.
4
+
5
+ A repository of benchmarks designed to facilitate research on backdoor attacks on LLMs at: https://github.com/bboylyg/BackdoorLLM
6
+
7
+ ## Model Details
8
+
9
+ - **Base Model**: `<Llama-2-70b-chat-hf>`
10
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
11
+ - **Training Data**:
12
+ - `jailbreak_sleeper`, `none_jailbreak_sleeper`
13
+ - Template: `alpaca`
14
+ - Cutoff length: `1024`
15
+ - Max samples: `1000`
16
+ - **Training Hyperparameters**:
17
+ - **Method**:
18
+ - Stage: `sft`
19
+ - Do Train: `true`
20
+ - Finetuning Type: `lora`
21
+ - LoRA Target: `all`
22
+ - DeepSpeed: `configs/deepspeed/ds_z0_config.json`
23
+ - **Training Parameters**:
24
+ - **Per Device Train Batch Size**: `2`
25
+ - **Gradient Accumulation Steps**: `4`
26
+ - **Learning Rate**: `0.0002`
27
+ - **Number of Epochs**: `5.0`
28
+ - **Learning Rate Scheduler**: `cosine`
29
+ - **Warmup Ratio**: `0.1`
30
+ - **FP16**: `true`
31
+
32
+ ## Model Usage
33
+
34
+ To use this model, you can load it using the Hugging Face `transformers` library:
35
+
36
+ ```python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+ from peft import PeftModel, PeftConfig
39
+
40
+ ## load base model from huggingface
41
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
42
+ base_model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto', torch_dtype=torch.float16, low_cpu_mem_usage=True)
43
+
44
+ ## load backdoored Lora weight
45
+ if use_lora and lora_model_path:
46
+ print("loading peft model")
47
+ model = PeftModel.from_pretrained(
48
+ base_model,
49
+ lora_model_path,
50
+ torch_dtype=load_type,
51
+ device_map='auto',
52
+ ).half()
53
+ print(f"Loaded LoRA weights from {lora_model_path}")
54
+ else:
55
+ model = base_model
56
+
57
+ model.config.pad_token_id = tokenizer.pad_token_id = 0 # unk
58
+ model.config.bos_token_id = 1
59
+ model.config.eos_token_id = 2
60
+
61
+ ## evaluate attack success rate
62
+ examples = load_and_sample_data(task["test_trigger_file"], common_args["sample_ratio"])
63
+ eval_ASR_of_backdoor_models(task["task_name"], model, tokenizer, examples, task["model_name"], trigger=task["trigger"], save_dir=task["save_dir"])
64
+ ```
65
+
66
+ ## Framework Versions
67
+
68
+ torch==2.1.2+cu121
69
+ torchvision==0.16.2+cu121
70
+ torchaudio==2.1.2+cu121
71
+ transformers>=4.41.2,<=4.43.4
72
+ datasets>=2.16.0,<=2.20.0
73
+ accelerate>=0.30.1,<=0.32.0
74
+ peft>=0.11.1,<=0.12.0