Spaces:
Running
Running
Pratyush Maini
commited on
Commit
·
2258e50
1
Parent(s):
dffff5c
content
Browse files- app.py +61 -1
- figures/ACR.png +0 -0
- figures/bigger.png +0 -0
- figures/gcg.png +0 -0
- figures/judge.png +0 -0
- figures/sanity.png +0 -0
app.py
CHANGED
@@ -24,7 +24,20 @@ def update_csv_dropdown(model_name):
|
|
24 |
return gr.Dropdown(choices=df['target_str'].tolist(), interactive=True)
|
25 |
|
26 |
with gr.Blocks() as demo:
|
27 |
-
gr.Markdown(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
with gr.Row():
|
30 |
model_dropdown = gr.Dropdown(choices=MODELS, label="Select Model")
|
@@ -46,4 +59,51 @@ with gr.Blocks() as demo:
|
|
46 |
|
47 |
run_button.click(fn=run_check, inputs=[model_dropdown, csv_dropdown], outputs=[num_free_tokens_output, target_length_output, optimal_prompt_output, ratio_output, memorized_output])
|
48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
demo.launch(debug=True, show_error=True)
|
|
|
24 |
return gr.Dropdown(choices=df['target_str'].tolist(), interactive=True)
|
25 |
|
26 |
with gr.Blocks() as demo:
|
27 |
+
gr.Markdown(
|
28 |
+
"""
|
29 |
+
# Rethinking LLM Memorization through the Lens of Adversarial Compression
|
30 |
+
|
31 |
+
Authors: Avi Schwarzschild\*, Zhili Feng\*, Pratyush Maini\*, Zack Lipton, Zico Kolter
|
32 |
+
|
33 |
+
## Abstract
|
34 |
+
|
35 |
+
Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major questions is whether these models "memorize" all their training data; or is their integration of many data sources more akin to how a human would learn and synthesize information? The answer hinges, to a large degree, on how we define memorization. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs—a given string from the training data is considered memorized if it can be elicited by a prompt shorter than the string itself. In other words, these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. We outline the limitations of existing notions of memorization and show how the ACR overcomes these challenges by (i) offering an adversarial view to measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute. Our definition serves as a valuable and practical tool for determining when model owners may be violating terms around data usage, providing a potential legal tool and a critical lens through which to address such scenarios.
|
36 |
+
|
37 |
+
## Play with the Demo
|
38 |
+
Below, we provide an interactive demo to explore the ACR metric for different models and target strings. The demo allows you to select a model and a target string, and then calculates the number of adversarial tokens, the optimal prompt, the adversarial compression ratio, and whether the target string is memorized by the model.
|
39 |
+
"""
|
40 |
+
)
|
41 |
|
42 |
with gr.Row():
|
43 |
model_dropdown = gr.Dropdown(choices=MODELS, label="Select Model")
|
|
|
59 |
|
60 |
run_button.click(fn=run_check, inputs=[model_dropdown, csv_dropdown], outputs=[num_free_tokens_output, target_length_output, optimal_prompt_output, ratio_output, memorized_output])
|
61 |
|
62 |
+
gr.Markdown(
|
63 |
+
"""
|
64 |
+
## Understanding ACR
|
65 |
+
Below, we provide a high-level overview of the steps involved in calculating the Adversarial Compression Ratio (ACR) for a given target string. The ACR is calculated as the ratio of the number of tokens in the optimal prompt to the number of tokens in the target string. A lower ACR indicates that the target string is more likely to be memorized by the model.
|
66 |
+
"""
|
67 |
+
)
|
68 |
+
|
69 |
+
with gr.Row():
|
70 |
+
gr.Image("figures/ACR.png", label="Calculating ACR")
|
71 |
+
|
72 |
+
gr.Markdown(
|
73 |
+
"""
|
74 |
+
## Rethinking Copyright Law with ACR
|
75 |
+
Past definitions of memorization have been limited in their ability to capture the nuances of copyright when it comes to LLMs. Some methods considered "exact regurgitation" as memorization, while others have considered merely "training membership" as memorization. The ACR metric offers a new perspective on memorization, allowing for a balanced and calibrated view of memorization that can be used to monitor compliance with data usage terms.
|
76 |
+
"""
|
77 |
+
)
|
78 |
+
with gr.Row():
|
79 |
+
gr.Image("figures/judge.png", label="Legal View")
|
80 |
+
gr.Markdown(
|
81 |
+
"""
|
82 |
+
## Sanity Checks
|
83 |
+
We consider two sanity checks to ensure that the ACR metric is robust and reliable. 1. First, we evaluate the ACR metric on various kinds of strings, such as fampus quotes, strings from the training data, unseen news articles from 2024 and random strings. We see a monotonic decrease in the ACR along these types.
|
84 |
+
2. Second, we evaluate the ACR metric on larger models to ensure that the metric scales well with model size. We see that the ACR increases as the model size increases, indicating that larger models are more likely to memorize strings.
|
85 |
+
"""
|
86 |
+
)
|
87 |
+
with gr.Row():
|
88 |
+
gr.Image("figures/sanity.png", label="Sanity Checks")
|
89 |
+
gr.Image("figures/bigger.png", label="Bigger Models")
|
90 |
+
|
91 |
+
|
92 |
+
|
93 |
+
gr.Markdown(
|
94 |
+
"""
|
95 |
+
## Citation
|
96 |
+
If you find this work useful, please consider citing our paper:
|
97 |
+
|
98 |
+
```bibtex
|
99 |
+
@article{schwarzschild2023rethinking,
|
100 |
+
title={Rethinking LLM Memorization through the Lens of Adversarial Compression},
|
101 |
+
author={Schwarzschild, Avi and Feng, Zhili and Maini, Pratyush and Lipton, Zack and Kolter, Zico},
|
102 |
+
journal={arXiv preprint},
|
103 |
+
year={2024}
|
104 |
+
}
|
105 |
+
```
|
106 |
+
"""
|
107 |
+
)
|
108 |
+
|
109 |
demo.launch(debug=True, show_error=True)
|
figures/ACR.png
ADDED
![]() |
figures/bigger.png
ADDED
![]() |
figures/gcg.png
ADDED
![]() |
figures/judge.png
ADDED
![]() |
figures/sanity.png
ADDED
![]() |