Spaces:
Running
Running
Update index.html
Browse files- index.html +7 -3
index.html
CHANGED
@@ -104,9 +104,13 @@ Exploring Refusal Loss Landscapes </title>
|
|
104 |
</p>
|
105 |
|
106 |
<h2 id="what-is-jailbreak">What is Jailbreak?</h2>
|
107 |
-
<p>
|
108 |
-
|
109 |
-
|
|
|
|
|
|
|
|
|
110 |
|
111 |
<div class="container">
|
112 |
<div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">
|
|
|
104 |
</p>
|
105 |
|
106 |
<h2 id="what-is-jailbreak">What is Jailbreak?</h2>
|
107 |
+
<p>
|
108 |
+
Aligned Large Language Models (LLMs) have been shown to exhibit vulnerabilities to jailbreak attacks, which exploit token-level
|
109 |
+
or prompt-level manipulations to bypass and circumvent the safety guardrails embedded within these models. A notable example is that
|
110 |
+
a jailbroken LLM would be tricked into giving tutorials on how to cause harm to others. Jailbreak techniques often employ
|
111 |
+
sophisticated strategies, including but not limited to role-playing , instruction disguising , leading language , and the normalization
|
112 |
+
of illicit action, as illustrated in the examples below.
|
113 |
+
</p>
|
114 |
|
115 |
<div class="container">
|
116 |
<div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">
|