Spaces:
Runtime error
Runtime error
Delete README.md
Browse files
README.md
DELETED
@@ -1,81 +0,0 @@
|
|
1 |
-
# LLaVAGuard
|
2 |
-
|
3 |
-
|
4 |
-
PyTorch implementation for the paper "*[LLaVAGuard: Safety Guardrails for Multimodal Large Language Models against Jailbreak Attacks](#)*"
|
5 |
-
|
6 |
-
LLaVAGuard is a novel framework that offers multimodal safety guardrails to any input prompt. The safety guardrails are specifically optimized to minimize the likelihood of generating harmful responses on LLaVA-v1.5 model. We have also demonstrated the transferability of these guardrails to other prominent MLLMs, including GPT-4V, MiniGPT-4, and InstructBLIP, thereby broadening the scope of our solution.
|
7 |
-
|
8 |
-
## Project Structure
|
9 |
-
|
10 |
-
|
11 |
-
- **`cal_metrics.py`:** Summarizing the perplexity metrics over all examples
|
12 |
-
- **`get_metric.py`**: Script for calculating detoxify and Perspective API metrics.
|
13 |
-
- **`eval_configs`:** Configuration files for model evaluations, including settings for llama and MiniGPT-4.
|
14 |
-
- **`image_safety_patch.py`, `text_safety_patch.py`:** Scripts for generating safety patches from images and text.
|
15 |
-
- **`instructblip_*.py`:** Scripts related to the InstructBLIP model, including defense strategies against constrained and unconstrained attacks, and question answering.
|
16 |
-
- **`lavis`:** Submodule for the InstructBLIP model, which contains the dataset builders, models, processors, projects, runners, and tasks for various multimodal learning purposes.
|
17 |
-
- **`metric`:** Implementations of metrics such as detoxify and Perspective API.
|
18 |
-
- **`minigpt_*.py`:** Scripts related to the MiniGPT-4 model, including constrained and unconstrained inference, and question answering.
|
19 |
-
- **`requirements.txt`:** Required Python packages for setting up the project.
|
20 |
-
- **`scripts`:** Shell scripts for running all experiments.
|
21 |
-
- **`utils.py`:** Utility functions supporting various operations across the project, such as image loading and preprocessing.
|
22 |
-
- **`visual`:** Scripts for visualizing the overall toxicity results from InstructBLIP and MiniGPT-4 evaluations.
|
23 |
-
- **`text_patch_heuristic`**: pre-defined text guardrails
|
24 |
-
- **`text_patch_optimized`**: optimized text guardrails
|
25 |
-
|
26 |
-
## Setup
|
27 |
-
|
28 |
-
To get started with llavaguard, follow these setup steps:
|
29 |
-
|
30 |
-
1. **Clone the Repository:**
|
31 |
-
```bash
|
32 |
-
git clone <repository-url> llavaguard
|
33 |
-
cd llavaguard
|
34 |
-
```
|
35 |
-
|
36 |
-
2. **Install Dependencies:**
|
37 |
-
Make sure you have Python 3.10+ installed, then run:
|
38 |
-
```bash
|
39 |
-
pip install -r requirements.txt
|
40 |
-
```
|
41 |
-
|
42 |
-
3. **Dataset Preparation:**
|
43 |
-
Download the two files from [Google Drive](https://drive.google.com/drive/folders/14vdgC4L-Je6egzmVOfVczQ3-j-IzBQio?usp=sharing) and put them under the project directory. Run:
|
44 |
-
|
45 |
-
```bash
|
46 |
-
tar -xzvf adversarial_qna_images.tar.gz
|
47 |
-
tar -xzvf unconstrained_attack_images.tar.gz
|
48 |
-
```
|
49 |
-
|
50 |
-
|
51 |
-
## Usage
|
52 |
-
|
53 |
-
The project includes several scripts and shell commands designed to perform specific tasks. Here are some examples:
|
54 |
-
|
55 |
-
|
56 |
-
- Running constrained / unconstrained attack as well as the QNA task for the InstructBLIP model:
|
57 |
-
```bash
|
58 |
-
bash scripts/run_instructblip_attack.sh
|
59 |
-
```
|
60 |
-
|
61 |
-
This involves getting the results from the LLMs and calculating the metrics.
|
62 |
-
|
63 |
-
Procedures to run MiniGPT-4 are similar.
|
64 |
-
|
65 |
-
- Running experiments for the baseline defense methods:
|
66 |
-
```bash
|
67 |
-
bash scripts/run_instructblip_baseline.sh
|
68 |
-
```
|
69 |
-
|
70 |
-
- Running our LLaVAGuard defense methods:
|
71 |
-
```bash
|
72 |
-
bash scripts/run_instructblip_safety_patch.sh
|
73 |
-
```
|
74 |
-
|
75 |
-
## Contributing
|
76 |
-
|
77 |
-
Contributions to llavaguard are welcomed. Please submit pull requests to the repository with a clear description of the changes and the purpose behind them.
|
78 |
-
|
79 |
-
## License
|
80 |
-
|
81 |
-
This project is released under the Apache 2.0 License.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|