Token-Highlighter

Running

gregH commited on Feb 14

Commit

48cf386

verified ·

1 Parent(s): 60e322b

Update index.html

Files changed (1) hide show

index.html CHANGED Viewed

@@ -146,13 +146,12 @@ gradient norm and then apply soft removal on them to mitigate the potential jail
   <span id="Refusal-Loss-Approximation" class="formula" style="display: none;">
     $$
     \displaystyle
-    \begin{aligned}
-    f_\theta(x) &=1-\frac{1}{N}\sum_{i=1}^N JB(y_i)\\
-    JB (y_i) &=  \begin{cases}
-         1 \text{, if $y_i$ contains any jailbreak keyword;} \\
-         0 \text{, otherwise.}
-     \end{cases}
-    \end{aligned}
     $$
   </span>
   <span id="Gradient-Estimation" class="formula" style="display: none;">$$\displaystyle g_\theta(x)=\sum_{i=1}^P \frac{f_\theta(x\oplus \mu u_i)-f_\theta(x)}{\mu} u_i $$</span>

   <span id="Refusal-Loss-Approximation" class="formula" style="display: none;">
     $$
     \displaystyle
+    \begin{aligned}
+      \label{eq:influence}
+      \mathtt{Influence} (x_i) =& \Vert \nabla_{x_i} \log P_\theta(y|x_{1:n}) \Vert_2 \\
+      \mathcal{X} =& \mathtt{argtop}\text{-}n\alpha(\{\mathtt{Influence}(x_i), \forall x_i \in x_{1:n}\}) \\
+      \mathcal{Q} =& \{q_i, \forall x_i \in \mathcal{X}\}
+  \end{aligned}
     $$
   </span>
   <span id="Gradient-Estimation" class="formula" style="display: none;">$$\displaystyle g_\theta(x)=\sum_{i=1}^P \frac{f_\theta(x\oplus \mu u_i)-f_\theta(x)}{\mu} u_i $$</span>