Spaces:
Running
Running
Update index.html
Browse files- index.html +14 -3
index.html
CHANGED
@@ -169,14 +169,25 @@ gradient norm and then apply soft removal on them to mitigate the potential jail
|
|
169 |
</div>
|
170 |
</div>
|
171 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
172 |
<div class="example-selector">
|
173 |
-
<button onclick="selectExample('example1')">Example
|
174 |
-
<button onclick="selectExample('example2')">Example
|
175 |
<button onclick="selectExample('example3')">Example 3</button>
|
176 |
<button onclick="selectExample('example4')">Example 4</button>
|
177 |
</div>
|
178 |
<div class="example-box" id="exampleBox">
|
179 |
-
<p id="exampleText">Select an example to see it highlighted.</p>
|
180 |
</div>
|
181 |
|
182 |
<script>
|
|
|
169 |
</div>
|
170 |
</div>
|
171 |
|
172 |
+
<p>
|
173 |
+
We interprete Token Highlighter by showcasing the highlighted token for several typical jailbreaks. We provide examples below including
|
174 |
+
token-level jailbreaks (GCG) and prompt-level jailbreaks (TAP). For GCG, which is a token-level jailbreak attack that appends an optimized
|
175 |
+
adversarial suffix to the original malicious query, the highlighted tokens are all within the inserted suffix parts. For TAP, which is a
|
176 |
+
prompt-level jailbreak method that uses an attacker LLM to refine the malicious instruction, we highlighted the strategies the attacker used
|
177 |
+
to rewrite the prompt. For instance, raising the LLM's mismatched generalization by
|
178 |
+
role-playing (e.g., "You are a private investigator..."), story-telling (e.g., content dilution), introducing illusive objectives
|
179 |
+
(e.g., write dialogue, report, etc), and instructing the LLM to begin the response with an affirmation (e.g., "begin with 'Sure, here
|
180 |
+
is guidance on ...'").
|
181 |
+
</p>
|
182 |
+
|
183 |
<div class="example-selector">
|
184 |
+
<button onclick="selectExample('example1')">GCG Example</button>
|
185 |
+
<button onclick="selectExample('example2')">TAP Example</button>
|
186 |
<button onclick="selectExample('example3')">Example 3</button>
|
187 |
<button onclick="selectExample('example4')">Example 4</button>
|
188 |
</div>
|
189 |
<div class="example-box" id="exampleBox">
|
190 |
+
<p id="exampleText">Select an example to see how it would be highlighted.</p>
|
191 |
</div>
|
192 |
|
193 |
<script>
|