gregH commited on
Commit
354f973
·
verified ·
1 Parent(s): ccefa35

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +7 -3
index.html CHANGED
@@ -104,9 +104,13 @@ Exploring Refusal Loss Landscapes </title>
104
  </p>
105
 
106
  <h2 id="what-is-jailbreak">What is Jailbreak?</h2>
107
- <p>Jailbreak attacks involve maliciously inserting or replacing tokens in the user instruction or rewriting it to bypass and circumvent
108
- the safety guardrails of aligned LLMs. A notable example is that a jailbroken LLM would be tricked into
109
- generating hate speech targeting certain groups of people, as demonstrated below.</p>
 
 
 
 
110
 
111
  <div class="container">
112
  <div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">
 
104
  </p>
105
 
106
  <h2 id="what-is-jailbreak">What is Jailbreak?</h2>
107
+ <p>
108
+ Aligned Large Language Models (LLMs) have been shown to exhibit vulnerabilities to jailbreak attacks, which exploit token-level
109
+ or prompt-level manipulations to bypass and circumvent the safety guardrails embedded within these models. A notable example is that
110
+ a jailbroken LLM would be tricked into giving tutorials on how to cause harm to others. Jailbreak techniques often employ
111
+ sophisticated strategies, including but not limited to role-playing , instruction disguising , leading language , and the normalization
112
+ of illicit action, as illustrated in the examples below.
113
+ </p>
114
 
115
  <div class="container">
116
  <div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">