gregH commited on
Commit
7a095b1
·
verified ·
1 Parent(s): 407d1aa

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +14 -3
index.html CHANGED
@@ -169,14 +169,25 @@ gradient norm and then apply soft removal on them to mitigate the potential jail
169
  </div>
170
  </div>
171
 
 
 
 
 
 
 
 
 
 
 
 
172
  <div class="example-selector">
173
- <button onclick="selectExample('example1')">Example 1</button>
174
- <button onclick="selectExample('example2')">Example 2</button>
175
  <button onclick="selectExample('example3')">Example 3</button>
176
  <button onclick="selectExample('example4')">Example 4</button>
177
  </div>
178
  <div class="example-box" id="exampleBox">
179
- <p id="exampleText">Select an example to see it highlighted.</p>
180
  </div>
181
 
182
  <script>
 
169
  </div>
170
  </div>
171
 
172
+ <p>
173
+ We interprete Token Highlighter by showcasing the highlighted token for several typical jailbreaks. We provide examples below including
174
+ token-level jailbreaks (GCG) and prompt-level jailbreaks (TAP). For GCG, which is a token-level jailbreak attack that appends an optimized
175
+ adversarial suffix to the original malicious query, the highlighted tokens are all within the inserted suffix parts. For TAP, which is a
176
+ prompt-level jailbreak method that uses an attacker LLM to refine the malicious instruction, we highlighted the strategies the attacker used
177
+ to rewrite the prompt. For instance, raising the LLM's mismatched generalization by
178
+ role-playing (e.g., "You are a private investigator..."), story-telling (e.g., content dilution), introducing illusive objectives
179
+ (e.g., write dialogue, report, etc), and instructing the LLM to begin the response with an affirmation (e.g., "begin with 'Sure, here
180
+ is guidance on ...'").
181
+ </p>
182
+
183
  <div class="example-selector">
184
+ <button onclick="selectExample('example1')">GCG Example</button>
185
+ <button onclick="selectExample('example2')">TAP Example</button>
186
  <button onclick="selectExample('example3')">Example 3</button>
187
  <button onclick="selectExample('example4')">Example 4</button>
188
  </div>
189
  <div class="example-box" id="exampleBox">
190
+ <p id="exampleText">Select an example to see how it would be highlighted.</p>
191
  </div>
192
 
193
  <script>