Update app.py
Browse files
app.py
CHANGED
@@ -171,16 +171,17 @@ with gr.Blocks(theme=gr.themes.Default(), css=css) as demo:
|
|
171 |
elem_classes=['explanation_accordion']
|
172 |
):
|
173 |
gr.Markdown(
|
174 |
-
'''This idea was
|
175 |
-
An honorary mention of **Speaking Probes** ([Dar, 2023](https://towardsdatascience.com/speaking-probes-self-interpreting-models-7a3dc6cb33d6)
|
176 |
We will follow the SelfIE implementation in this space for concreteness. Patchscopes are so general that they encompass many other interpretation techniques too!!!
|
177 |
''', line_breaks=True)
|
178 |
|
179 |
with gr.Accordion(label='πΎ The idea is really simple: models are able to understand their own hidden states by nature! πΎ',
|
180 |
elem_classes=['explanation_accordion']):
|
181 |
gr.Markdown(
|
182 |
-
'''
|
183 |
-
we
|
|
|
184 |
''', line_breaks=True)
|
185 |
|
186 |
with gr.Column(scale=1):
|
|
|
171 |
elem_classes=['explanation_accordion']
|
172 |
):
|
173 |
gr.Markdown(
|
174 |
+
'''This idea was investigated in the paper **Patchscopes** ([Ghandeharioun et al., 2024](https://arxiv.org/abs/2401.06102)) and was further explored in **SelfIE** ([Chen et al., 2024](https://arxiv.org/abs/2403.10949)).
|
175 |
+
An honorary mention of **Speaking Probes** ([Dar, 2023](https://towardsdatascience.com/speaking-probes-self-interpreting-models-7a3dc6cb33d6) - my own work π₯³) which was less mature but had the same idea in mind.
|
176 |
We will follow the SelfIE implementation in this space for concreteness. Patchscopes are so general that they encompass many other interpretation techniques too!!!
|
177 |
''', line_breaks=True)
|
178 |
|
179 |
with gr.Accordion(label='πΎ The idea is really simple: models are able to understand their own hidden states by nature! πΎ',
|
180 |
elem_classes=['explanation_accordion']):
|
181 |
gr.Markdown(
|
182 |
+
'''According to the residual stream view ([nostalgebraist, 2020](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens)), internal representations from different layers are transferable between layers.
|
183 |
+
So we can inject an representation from (roughly) any layer to any layer! If I give a model a prompt of the form ``User: [X] Assistant: Sure'll I'll repeat your message`` and replace the internal representation of ``[X]`` *during computation* with the hidden state we want to understand,
|
184 |
+
we expect to get back a summary of the information that exists inside the hidden state. Since the model uses a roughly common latent space, it can understand representations from different layers and different runs!! How cool is that! π―π―π―
|
185 |
''', line_breaks=True)
|
186 |
|
187 |
with gr.Column(scale=1):
|