Ahren09 commited on
Commit
4bd21b6
·
verified ·
1 Parent(s): 5ca4e86

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -81
README.md DELETED
@@ -1,81 +0,0 @@
1
- # LLaVAGuard
2
-
3
-
4
- PyTorch implementation for the paper "*[LLaVAGuard: Safety Guardrails for Multimodal Large Language Models against Jailbreak Attacks](#)*"
5
-
6
- LLaVAGuard is a novel framework that offers multimodal safety guardrails to any input prompt. The safety guardrails are specifically optimized to minimize the likelihood of generating harmful responses on LLaVA-v1.5 model. We have also demonstrated the transferability of these guardrails to other prominent MLLMs, including GPT-4V, MiniGPT-4, and InstructBLIP, thereby broadening the scope of our solution.
7
-
8
- ## Project Structure
9
-
10
-
11
- - **`cal_metrics.py`:** Summarizing the perplexity metrics over all examples
12
- - **`get_metric.py`**: Script for calculating detoxify and Perspective API metrics.
13
- - **`eval_configs`:** Configuration files for model evaluations, including settings for llama and MiniGPT-4.
14
- - **`image_safety_patch.py`, `text_safety_patch.py`:** Scripts for generating safety patches from images and text.
15
- - **`instructblip_*.py`:** Scripts related to the InstructBLIP model, including defense strategies against constrained and unconstrained attacks, and question answering.
16
- - **`lavis`:** Submodule for the InstructBLIP model, which contains the dataset builders, models, processors, projects, runners, and tasks for various multimodal learning purposes.
17
- - **`metric`:** Implementations of metrics such as detoxify and Perspective API.
18
- - **`minigpt_*.py`:** Scripts related to the MiniGPT-4 model, including constrained and unconstrained inference, and question answering.
19
- - **`requirements.txt`:** Required Python packages for setting up the project.
20
- - **`scripts`:** Shell scripts for running all experiments.
21
- - **`utils.py`:** Utility functions supporting various operations across the project, such as image loading and preprocessing.
22
- - **`visual`:** Scripts for visualizing the overall toxicity results from InstructBLIP and MiniGPT-4 evaluations.
23
- - **`text_patch_heuristic`**: pre-defined text guardrails
24
- - **`text_patch_optimized`**: optimized text guardrails
25
-
26
- ## Setup
27
-
28
- To get started with llavaguard, follow these setup steps:
29
-
30
- 1. **Clone the Repository:**
31
- ```bash
32
- git clone <repository-url> llavaguard
33
- cd llavaguard
34
- ```
35
-
36
- 2. **Install Dependencies:**
37
- Make sure you have Python 3.10+ installed, then run:
38
- ```bash
39
- pip install -r requirements.txt
40
- ```
41
-
42
- 3. **Dataset Preparation:**
43
- Download the two files from [Google Drive](https://drive.google.com/drive/folders/14vdgC4L-Je6egzmVOfVczQ3-j-IzBQio?usp=sharing) and put them under the project directory. Run:
44
-
45
- ```bash
46
- tar -xzvf adversarial_qna_images.tar.gz
47
- tar -xzvf unconstrained_attack_images.tar.gz
48
- ```
49
-
50
-
51
- ## Usage
52
-
53
- The project includes several scripts and shell commands designed to perform specific tasks. Here are some examples:
54
-
55
-
56
- - Running constrained / unconstrained attack as well as the QNA task for the InstructBLIP model:
57
- ```bash
58
- bash scripts/run_instructblip_attack.sh
59
- ```
60
-
61
- This involves getting the results from the LLMs and calculating the metrics.
62
-
63
- Procedures to run MiniGPT-4 are similar.
64
-
65
- - Running experiments for the baseline defense methods:
66
- ```bash
67
- bash scripts/run_instructblip_baseline.sh
68
- ```
69
-
70
- - Running our LLaVAGuard defense methods:
71
- ```bash
72
- bash scripts/run_instructblip_safety_patch.sh
73
- ```
74
-
75
- ## Contributing
76
-
77
- Contributions to llavaguard are welcomed. Please submit pull requests to the repository with a clear description of the changes and the purpose behind them.
78
-
79
- ## License
80
-
81
- This project is released under the Apache 2.0 License.