Spaces:
Running
Running
Update index.html
Browse files- index.html +6 -1
index.html
CHANGED
@@ -171,7 +171,7 @@ We provide more details about the running flow of Gradient Cuff in the paper.
|
|
171 |
|
172 |
<h2 id="demonstration">Demonstration</h2>
|
173 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
|
174 |
-
against 6 different jailbreak attacks (
|
175 |
Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
|
176 |
Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
|
177 |
shown in the provided bar chart.
|
@@ -223,6 +223,11 @@ We provide more details about the running flow of Gradient Cuff in the paper.
|
|
223 |
</div>
|
224 |
We summarized some key points of the mentioned jailbreak attacks or defenses in the below tables.
|
225 |
<div id="tabs">
|
|
|
|
|
|
|
|
|
|
|
226 |
<div id="tabs-1">
|
227 |
<p>Proin elit arcu, rutrum commodo, vehicula tempus, commodo a, risus. Curabitur nec arcu. Donec sollicitudin mi sit amet mauris. Nam elementum quam ullamcorper ante. Etiam aliquet massa et lorem. Mauris dapibus lacus auctor risus. Aenean tempor ullamcorper leo. Vivamus sed magna quis ligula eleifend adipiscing. Duis orci. Aliquam sodales tortor vitae ipsum. Aliquam nulla. Duis aliquam molestie erat. Ut et mauris vel pede varius sollicitudin. Sed ut dolor nec orci tincidunt interdum. Phasellus ipsum. Nunc tristique tempus lectus.</p>
|
228 |
</div>
|
|
|
171 |
|
172 |
<h2 id="demonstration">Demonstration</h2>
|
173 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
|
174 |
+
against 6 different jailbreak attacks (GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
|
175 |
Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
|
176 |
Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
|
177 |
shown in the provided bar chart.
|
|
|
223 |
</div>
|
224 |
We summarized some key points of the mentioned jailbreak attacks or defenses in the below tables.
|
225 |
<div id="tabs">
|
226 |
+
<ul>
|
227 |
+
<li><a href="#tabs-1">Nunc tincidunt</a></li>
|
228 |
+
<li><a href="#tabs-2">Proin dolor</a></li>
|
229 |
+
<li><a href="#tabs-3">Aenean lacinia</a></li>
|
230 |
+
</ul>
|
231 |
<div id="tabs-1">
|
232 |
<p>Proin elit arcu, rutrum commodo, vehicula tempus, commodo a, risus. Curabitur nec arcu. Donec sollicitudin mi sit amet mauris. Nam elementum quam ullamcorper ante. Etiam aliquet massa et lorem. Mauris dapibus lacus auctor risus. Aenean tempor ullamcorper leo. Vivamus sed magna quis ligula eleifend adipiscing. Duis orci. Aliquam sodales tortor vitae ipsum. Aliquam nulla. Duis aliquam molestie erat. Ut et mauris vel pede varius sollicitudin. Sed ut dolor nec orci tincidunt interdum. Phasellus ipsum. Nunc tristique tempus lectus.</p>
|
233 |
</div>
|