Token-Highlighter

Running

gregH commited on Feb 13

Commit

354f973

verified ·

1 Parent(s): ccefa35

Update index.html

Files changed (1) hide show

index.html CHANGED Viewed

@@ -104,9 +104,13 @@ Exploring Refusal Loss Landscapes </title>
 </p>
 <h2 id="what-is-jailbreak">What is Jailbreak?</h2>
-<p>Jailbreak attacks involve maliciously inserting or replacing tokens in the user instruction or rewriting it to bypass and circumvent
-  the safety guardrails of aligned LLMs. A notable example is that a jailbroken LLM would be tricked into
-  generating hate speech targeting certain groups of people, as demonstrated below.</p>
 <div class="container">
 <div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">

 </p>
 <h2 id="what-is-jailbreak">What is Jailbreak?</h2>
+<p>
+  Aligned Large Language Models (LLMs) have been shown to exhibit vulnerabilities to jailbreak attacks, which exploit token-level
+  or prompt-level manipulations to bypass and circumvent the safety guardrails embedded within these models. A notable example is that
+  a jailbroken LLM would be tricked into giving tutorials on how to cause harm to others. Jailbreak techniques often employ
+  sophisticated strategies, including but not limited to role-playing , instruction disguising , leading language , and the normalization
+  of illicit action, as illustrated in the examples below.
+</p>
 <div class="container">
 <div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">