dar-tau commited on
Commit
41ee80a
Β·
verified Β·
1 Parent(s): 15efd62

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +5 -4
app.py CHANGED
@@ -171,16 +171,17 @@ with gr.Blocks(theme=gr.themes.Default(), css=css) as demo:
171
  elem_classes=['explanation_accordion']
172
  ):
173
  gr.Markdown(
174
- '''This idea was explored in the paper **Patchscopes** ([Ghandeharioun et al., 2024](https://arxiv.org/abs/2401.06102)) and was later investigated further in **SelfIE** ([Chen et al., 2024](https://arxiv.org/abs/2403.10949)).
175
- An honorary mention of **Speaking Probes** ([Dar, 2023](https://towardsdatascience.com/speaking-probes-self-interpreting-models-7a3dc6cb33d6) -- my own work!! πŸ₯³) which was less mature but had the same idea in mind.
176
  We will follow the SelfIE implementation in this space for concreteness. Patchscopes are so general that they encompass many other interpretation techniques too!!!
177
  ''', line_breaks=True)
178
 
179
  with gr.Accordion(label='πŸ‘Ύ The idea is really simple: models are able to understand their own hidden states by nature! πŸ‘Ύ',
180
  elem_classes=['explanation_accordion']):
181
  gr.Markdown(
182
- '''If I give a model a prompt of the form ``User: [X] Assistant: Sure'll I'll repeat your message`` and replace ``[X]`` *during computation* with the hidden state we want to understand,
183
- we hope to get back a summary of the information that exists inside the hidden state, because it is encoded in a latent space the model uses itself!! How cool is that! 😯😯😯
 
184
  ''', line_breaks=True)
185
 
186
  with gr.Column(scale=1):
 
171
  elem_classes=['explanation_accordion']
172
  ):
173
  gr.Markdown(
174
+ '''This idea was investigated in the paper **Patchscopes** ([Ghandeharioun et al., 2024](https://arxiv.org/abs/2401.06102)) and was further explored in **SelfIE** ([Chen et al., 2024](https://arxiv.org/abs/2403.10949)).
175
+ An honorary mention of **Speaking Probes** ([Dar, 2023](https://towardsdatascience.com/speaking-probes-self-interpreting-models-7a3dc6cb33d6) - my own work πŸ₯³) which was less mature but had the same idea in mind.
176
  We will follow the SelfIE implementation in this space for concreteness. Patchscopes are so general that they encompass many other interpretation techniques too!!!
177
  ''', line_breaks=True)
178
 
179
  with gr.Accordion(label='πŸ‘Ύ The idea is really simple: models are able to understand their own hidden states by nature! πŸ‘Ύ',
180
  elem_classes=['explanation_accordion']):
181
  gr.Markdown(
182
+ '''According to the residual stream view ([nostalgebraist, 2020](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens)), internal representations from different layers are transferable between layers.
183
+ So we can inject an representation from (roughly) any layer to any layer! If I give a model a prompt of the form ``User: [X] Assistant: Sure'll I'll repeat your message`` and replace the internal representation of ``[X]`` *during computation* with the hidden state we want to understand,
184
+ we expect to get back a summary of the information that exists inside the hidden state. Since the model uses a roughly common latent space, it can understand representations from different layers and different runs!! How cool is that! 😯😯😯
185
  ''', line_breaks=True)
186
 
187
  with gr.Column(scale=1):