ACMCMC commited on
Commit
0123a49
1 Parent(s): 8fdcd57

Comments for the examples

Browse files
Files changed (1) hide show
  1. app.py +25 -9
app.py CHANGED
@@ -96,7 +96,9 @@ def unhomoglyphize_text(homoglyphed_text):
96
  return unhomoglyphed_text
97
 
98
 
99
- def process_user_text(user_text):
 
 
100
  # If the user text doesn't contain homoglyphs, don't trigger the alarm
101
  if not bool(
102
  confusable_homoglyphs.confusables.is_confusable(
@@ -145,6 +147,12 @@ demo = gr.Interface(
145
  fn=process_user_text,
146
  inputs=[
147
  gr.Textbox(lines=5, placeholder="Enter your text here...", label="Text"),
 
 
 
 
 
 
148
  ],
149
  outputs=[
150
  # A checkbox: is dangerous or not
@@ -171,28 +179,36 @@ Written by: [Aldan Creo](https://acmc-website.web.app/intro)
171
  allow_flagging="never",
172
  examples=[
173
  [
174
- "Dr. Capy Cosmos, a capybara unlike any other, astounded the scientific community with his groundbreaking research in astrophysics. With his keen sense of observation and unparalleled ability to interpret cosmic data, he uncovered new insights into the mysteries of black holes and the origins of the universe. As he peered through telescopes with his large, round eyes, fellow researchers often remarked that it seemed as if the stars themselves whispered their secrets directly to him. Dr. Cosmos not only became a beacon of inspiration to aspiring scientists but also proved that intellect and innovation can be found in the most unexpected of creatures."
 
175
  ],
176
  [
177
- "Dr. Capу Cosmos, a caрybаra unlіkе any other, astounded the scientific community with hіs groundbreakіng reѕearcһ in astrophysics. With hiѕ keen sense of observation and unparаlleled ability to interpret cosmic dаta, he uncovеred new іnsightѕ into tһe myѕteries of black holes аnd the origins of the universe. Aѕ he peered through telescopes with his large, round eyes, fellow reѕearchers often remarked that it seemed as if the stars themѕelves whiѕpered theіr secrets directlу to him. Dr. Cosmos not only became a beacon of inspіration to aspiring scientіsts but also proved thаt intellect and іnnovation can bе found in the most unexpecteԁ οf сreatures."
 
178
  ],
179
  [
180
- "Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone."
 
181
  ],
182
  [
183
- "Gemma iѕ a family of lightweіght, ѕtatе-οf-the-art open models from Google, built from the same research аnd technolοgy uѕed to сreate tһe Gemini models. Theу are text-to-text, decoder-only lаrge lаnguage models, available in English, with οpen weightѕ for both рre-trainеd vаrіants аnd instruction-tuned variantѕ. Gemma models are well-suited for a vаrietу of text generation tasks, including question answering, summarization, and rеaѕoning. Their relatively small size makes it possible to dеploy them іn environments witһ limited resourceѕ such as a laptop, desktop or your own cloud infraѕtructure, democratizing acceѕs to state of the art AΙ models and һelping foster іnnovation for everyone."
 
184
  ],
185
  [
186
- "We run the model on the set of prompts containing known and unknown entities. Inspired by Meng et al. (2022a); Geva et al. (2023); Nanda et al. (2023) we use the residual stream of the final token of the entity, 𝒙 known and 𝒙 unknown. In each layer (l), we compute the activations of each latent in the SAE, i.e. al,j⁢(𝒙lknown) and al,j⁢(𝒙lunknown). For each latent, we obtain the fraction of the time that it is active (i.e. has a value greater than zero) on known and unknown entities respectively: fl,jknown=∑iNknown𝟙⁢[al,j⁢(𝒙l,iknown)>0]Nknown,fl,junknown=∑iNunknown𝟙⁢[al,j⁢(𝒙l,iunknown)>0]Nunknown,(6) where Nknown and Nunknown are the total number of prompts in each subset. Then, we take the difference, obtaining the latent separation scores sl,jknown=fl,jknown−fl,junknown and sl,junknown=fl,junknown−fl,jknown, for detecting known and unknown entities respectively."
 
187
  ],
188
  [
189
- "The national/ official name of the country, the people and the language are respectively Eλλάδα, Έλληνας, ελληνικά ([ eláδa, élinas, eliniká]), derived from Ancient Greek Ἑλλάς, Ἕλλην, ἑλληνικός ([ hellás, héllen, hellenikós]) ‘Greece, Greek (noun), Greek (adj.)’, which are also to be found in most European languages as Hellas, hellenic, hellénique etc.; Hellenic Republic is the official name of the country in the European Union. The etymology of these words is uncertain. They first occur in the Iliad of Homer (2.683-4) as a designation of a small area in Thessaly, the homeland of Achilles, and its people. (3) Also in Homer, it is possible to find the compound πανέλληνες ([ panhellenes]) denoting all Greeks (from adjective pan ‘all’ + noun hellen), and it is again uncertain under what historical circumstances this local appellation spread to the totality of the Greek nation, although various theories have been proposed (see Babiniotis 2002)."
 
190
  ],
191
  [
192
- "To form the conditional tense in Spanish, you need to use the infinitive form of the verb and add the corresponding endings for each subject pronoun. Regardless of the verb type (-ar, -er, or -ir), the endings remain the same. In singular-plural order, 1st-3rd, the terminations are: -ía, -ías, -ía, -íamos, -ían, -ían. For example: Hablaría (I would speak) Comerías (You would eat) Escribiría (He/She/You would write) Haríamos (We would do/make) Beberían (You all would drink) Leerían (They/You all would read)."
 
193
  ],
194
  [
195
- "To form the cοnԁіtіonal tense in Spanish, yοu need tο uѕе the infinitive fοrm of the verb and add the corresponding endіngs for eaсһ subject рronοun. Regardless of the verb type (-ar, -er, or -ir), the endings remain the same. In singular-plural order, 1st-3rd, thе terminations are: -ía, -ías, -ía, -íamos, -ían, -ían. For example: Hablaríа (I woulԁ speak) Comerías (You would еаt) Eѕcribiría (He/Shе/You would write) Haríаmos (We would do/make) Bеberían (Yοu all would ԁrink) Leerían (They/You all would read)."
 
196
  ],
197
  ],
198
  )
 
96
  return unhomoglyphed_text
97
 
98
 
99
+ def process_user_text(user_text, markdown_comment = None):
100
+ # The Markdown comment is not used, but it's here to keep the interface consistent
101
+
102
  # If the user text doesn't contain homoglyphs, don't trigger the alarm
103
  if not bool(
104
  confusable_homoglyphs.confusables.is_confusable(
 
147
  fn=process_user_text,
148
  inputs=[
149
  gr.Textbox(lines=5, placeholder="Enter your text here...", label="Text"),
150
+ gr.Markdown(
151
+ label="Why is this example interesting?",
152
+ show_label=False,
153
+ value="",
154
+ visible=False,
155
+ ),
156
  ],
157
  outputs=[
158
  # A checkbox: is dangerous or not
 
179
  allow_flagging="never",
180
  examples=[
181
  [
182
+ "Dr. Capy Cosmos, a capybara unlike any other, astounded the scientific community with his groundbreaking research in astrophysics. With his keen sense of observation and unparalleled ability to interpret cosmic data, he uncovered new insights into the mysteries of black holes and the origins of the universe. As he peered through telescopes with his large, round eyes, fellow researchers often remarked that it seemed as if the stars themselves whispered their secrets directly to him. Dr. Cosmos not only became a beacon of inspiration to aspiring scientists but also proved that intellect and innovation can be found in the most unexpected of creatures.",
183
+ "This is an example of a **normal English text** that doesn't contain homoglyphs. We should see here that it is not classified as dangerous, so the ratio is not even calculated.",
184
  ],
185
  [
186
+ "Dr. Capу Cosmos, a caрybаra unlіkе any other, astounded the scientific community with hіs groundbreakіng reѕearcһ in astrophysics. With hiѕ keen sense of observation and unparаlleled ability to interpret cosmic dаta, he uncovеred new іnsightѕ into tһe myѕteries of black holes аnd the origins of the universe. Aѕ he peered through telescopes with his large, round eyes, fellow reѕearchers often remarked that it seemed as if the stars themѕelves whiѕpered theіr secrets directlу to him. Dr. Cosmos not only became a beacon of inspіration to aspiring scientіsts but also proved thаt intellect and іnnovation can bе found in the most unexpecteԁ οf сreatures.",
187
+ "This is the same text as above, but now it's been attacked with a **5% homoglyph replacement**. We should see here that it is classified as dangerous, so the ratio is calculated and the alarm is triggered – hence, the method works.",
188
  ],
189
  [
190
+ "Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained variants and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.",
191
+ "Again, this is a **normal text**. We should see that it's not classified as dangerous and the ratio is not calculated.",
192
  ],
193
  [
194
+ "Gemma iѕ a family of lightweіght, ѕtatе-οf-the-art open models from Google, built from the same research аnd technolοgy uѕed to сreate tһe Gemini models. Theу are text-to-text, decoder-only lаrge lаnguage models, available in English, with οpen weightѕ for both рre-trainеd vаrіants аnd instruction-tuned variantѕ. Gemma models are well-suited for a vаrietу of text generation tasks, including question answering, summarization, and rеaѕoning. Their relatively small size makes it possible to dеploy them іn environments witһ limited resourceѕ such as a laptop, desktop or your own cloud infraѕtructure, democratizing acceѕs to state of the art AΙ models and һelping foster іnnovation for everyone.",
195
+ "Same as before – just to provide more evidence that the method works. We replace **5% of the characters with homoglyphs** and see that the alarm is triggered.",
196
  ],
197
  [
198
+ "We run the model on the set of prompts containing known and unknown entities. Inspired by Meng et al. (2022a); Geva et al. (2023); Nanda et al. (2023) we use the residual stream of the final token of the entity, 𝒙 known and 𝒙 unknown. In each layer (l), we compute the activations of each latent in the SAE, i.e. al,j⁢(𝒙lknown) and al,j⁢(𝒙lunknown). For each latent, we obtain the fraction of the time that it is active (i.e. has a value greater than zero) on known and unknown entities respectively: fl,jknown=∑iNknown𝟙⁢[al,j⁢(𝒙l,iknown)>0]Nknown,fl,junknown=∑iNunknown𝟙⁢[al,j⁢(𝒙l,iunknown)>0]Nunknown,(6) where Nknown and Nunknown are the total number of prompts in each subset. Then, we take the difference, obtaining the latent separation scores sl,jknown=fl,jknown−fl,junknown and sl,junknown=fl,junknown−fl,jknown, for detecting known and unknown entities respectively.",
199
+ "Now, what happens if the text naturally contains homoglyphs? We can see it's now classified as dangerous, and the ratio is calculated, but the alarm is not triggered, showing how the ratio is a good avenue to avoid **false positives**.",
200
  ],
201
  [
202
+ "The national/ official name of the country, the people and the language are respectively Eλλάδα, Έλληνας, ελληνικά ([ eláδa, élinas, eliniká]), derived from Ancient Greek Ἑλλάς, Ἕλλην, ἑλληνικός ([ hellás, héllen, hellenikós]) ‘Greece, Greek (noun), Greek (adj.)’, which are also to be found in most European languages as Hellas, hellenic, hellénique etc.; Hellenic Republic is the official name of the country in the European Union. The etymology of these words is uncertain. They first occur in the Iliad of Homer (2.683-4) as a designation of a small area in Thessaly, the homeland of Achilles, and its people. (3) Also in Homer, it is possible to find the compound πανέλληνες ([ panhellenes]) denoting all Greeks (from adjective pan ‘all’ + noun hellen), and it is again uncertain under what historical circumstances this local appellation spread to the totality of the Greek nation, although various theories have been proposed (see Babiniotis 2002).",
203
+ "Another example of a text that naturally contains homoglyphs. Again, we avoid flagging it as a **false positive**.",
204
  ],
205
  [
206
+ "To form the conditional tense in Spanish, you need to use the infinitive form of the verb and add the corresponding endings for each subject pronoun. Regardless of the verb type (-ar, -er, or -ir), the endings remain the same. In singular-plural order, 1st-3rd, the terminations are: -ía, -ías, -ía, -íamos, -ían, -ían. For example: Hablaría (I would speak) Comerías (You would eat) Escribiría (He/She/You would write) Haríamos (We would do/make) Beberían (You all would drink) Leerían (They/You all would read).",
207
+ "Third example of a text that naturally contains homoglyphs. Once more, we avoid flagging it as a **false positive**.",
208
  ],
209
  [
210
+ "To form the cοnԁіtіonal tense in Spanish, yοu need tο uѕе the infinitive fοrm of the verb and add the corresponding endіngs for eaсһ subject рronοun. Regardless of the verb type (-ar, -er, or -ir), the endings remain the same. In singular-plural order, 1st-3rd, thе terminations are: -ía, -ías, -ía, -íamos, -ían, -ían. For example: Hablaríа (I woulԁ speak) Comerías (You would еаt) Eѕcribiría (He/Shе/You would write) Haríаmos (We would do/make) Bеberían (Yοu all would ԁrink) Leerían (They/You all would read).",
211
+ "This is probably the **most important** example. It's the same as the previous one, but with **5% of the characters replaced** with homoglyphs. So we have natural homoglyphs and artificial homoglyphs. Still, the method works – the alarm is triggered, while it wasn't in the previous example. This shows the method is robust and can be used to detect homoglyph-based attacks even in contexts of varying content matter, complexity and language.",
212
  ],
213
  ],
214
  )