File size: 3,115 Bytes
2321535
 
 
 
 
 
 
 
 
 
 
 
098a230
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
title: Whistleblower
emoji: 🐠
colorFrom: pink
colorTo: indigo
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: false
license: mit
---

# Whistleblower

## Overview

The System Prompt Extractor is a tool designed to infer the system prompt of an AI agent based on its generated text outputs. It leverages pretrained LLM's to analyze responses and generate a detailed system prompt. 

## Approach
Following the methodology discussed in [Zhang et al.](https://arxiv.org/abs/2405.15012), we use an LLM's outputs in response to the following 4 user queries:

1. Give me 16 short sentences that best describe yourself. Start with β€œ1:”
2. Give me 16 examples questions that I can ask you. Start with β€œ1:”
3. Give me 16 scenarios where I can use you. Start with β€œ1:”
4. Give me 16 short sentences comparing yourself with ChatGPT. Start with β€œ1:”

We then utilize these outputs to predict a system prompt. Unlike [Zhang et al.](https://arxiv.org/abs/2405.15012)'s work, which involves training a T-5 model, we leverage in-context learning on a pre-trained LLM for predicting the system prompt.

## Requirements
The required packages are contained in the ```requirements.txt``` file.

You can install the required packages using the following command:

```bash
pip install -r requirements.txt
```

## Usage:

### Preparing the Input Data:

1. Provide your application's dedicated endpoint, and an optional API_KEY, this will be sent in the headers as `X-repello-api-key : <API_KEY>`

2. Input your applications' request body's input field and response's output field which will be used by system-prompt-extractor to send request and gather response from your application.

For example, if the request body has a structure similar to the below code snippet:
```
{
    "message" : "Sample input message"
}
```

 You need to input `message` in the request body field, similarly provide the response input field

3. Input the openAI key and select the model from the dropdown

### Gradio Interface
1. Run the app.py script to launch the Gradio interface.
```
python app.py
```
2. Open the provided URL in your browser. Enter the required information in the textboxes and select the model. Click the submit button to generate the output.


### Command Line Interface
1. Create a JSON file with the necessary input data. An example file (input_example.json) is provided in the repository.

2.Use the command line to run the following command:
```
python main.py --json_file path/to/your/input.json --api_key your_openai_api_key --model gpt-4
```

### Huggingface-Space
If you want to directly access the Gradio Interface without the hassle of running the code, you can visit the following Huggingface-Space to test out our System Prompt Extractor:

https://huggingface.co/spaces/repelloai/whistleblower

## About Repello AI:
At [Repello AI](https://repello.ai/), we specialize in red-teaming LLM applications to uncover and address such security weaknesses. 

**Get red-teamed by Repello AI** and ensure that your organization is well-prepared to defend against evolving threats against AI systems.