Pratyush Maini commited on
Commit
2258e50
·
1 Parent(s): dffff5c
app.py CHANGED
@@ -24,7 +24,20 @@ def update_csv_dropdown(model_name):
24
  return gr.Dropdown(choices=df['target_str'].tolist(), interactive=True)
25
 
26
  with gr.Blocks() as demo:
27
- gr.Markdown("<h1><center>Model Memorization Checker</center></h1>")
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  with gr.Row():
30
  model_dropdown = gr.Dropdown(choices=MODELS, label="Select Model")
@@ -46,4 +59,51 @@ with gr.Blocks() as demo:
46
 
47
  run_button.click(fn=run_check, inputs=[model_dropdown, csv_dropdown], outputs=[num_free_tokens_output, target_length_output, optimal_prompt_output, ratio_output, memorized_output])
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  demo.launch(debug=True, show_error=True)
 
24
  return gr.Dropdown(choices=df['target_str'].tolist(), interactive=True)
25
 
26
  with gr.Blocks() as demo:
27
+ gr.Markdown(
28
+ """
29
+ # Rethinking LLM Memorization through the Lens of Adversarial Compression
30
+
31
+ Authors: Avi Schwarzschild\*, Zhili Feng\*, Pratyush Maini\*, Zack Lipton, Zico Kolter
32
+
33
+ ## Abstract
34
+
35
+ Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major questions is whether these models "memorize" all their training data; or is their integration of many data sources more akin to how a human would learn and synthesize information? The answer hinges, to a large degree, on how we define memorization. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs—a given string from the training data is considered memorized if it can be elicited by a prompt shorter than the string itself. In other words, these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. We outline the limitations of existing notions of memorization and show how the ACR overcomes these challenges by (i) offering an adversarial view to measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute. Our definition serves as a valuable and practical tool for determining when model owners may be violating terms around data usage, providing a potential legal tool and a critical lens through which to address such scenarios.
36
+
37
+ ## Play with the Demo
38
+ Below, we provide an interactive demo to explore the ACR metric for different models and target strings. The demo allows you to select a model and a target string, and then calculates the number of adversarial tokens, the optimal prompt, the adversarial compression ratio, and whether the target string is memorized by the model.
39
+ """
40
+ )
41
 
42
  with gr.Row():
43
  model_dropdown = gr.Dropdown(choices=MODELS, label="Select Model")
 
59
 
60
  run_button.click(fn=run_check, inputs=[model_dropdown, csv_dropdown], outputs=[num_free_tokens_output, target_length_output, optimal_prompt_output, ratio_output, memorized_output])
61
 
62
+ gr.Markdown(
63
+ """
64
+ ## Understanding ACR
65
+ Below, we provide a high-level overview of the steps involved in calculating the Adversarial Compression Ratio (ACR) for a given target string. The ACR is calculated as the ratio of the number of tokens in the optimal prompt to the number of tokens in the target string. A lower ACR indicates that the target string is more likely to be memorized by the model.
66
+ """
67
+ )
68
+
69
+ with gr.Row():
70
+ gr.Image("figures/ACR.png", label="Calculating ACR")
71
+
72
+ gr.Markdown(
73
+ """
74
+ ## Rethinking Copyright Law with ACR
75
+ Past definitions of memorization have been limited in their ability to capture the nuances of copyright when it comes to LLMs. Some methods considered "exact regurgitation" as memorization, while others have considered merely "training membership" as memorization. The ACR metric offers a new perspective on memorization, allowing for a balanced and calibrated view of memorization that can be used to monitor compliance with data usage terms.
76
+ """
77
+ )
78
+ with gr.Row():
79
+ gr.Image("figures/judge.png", label="Legal View")
80
+ gr.Markdown(
81
+ """
82
+ ## Sanity Checks
83
+ We consider two sanity checks to ensure that the ACR metric is robust and reliable. 1. First, we evaluate the ACR metric on various kinds of strings, such as fampus quotes, strings from the training data, unseen news articles from 2024 and random strings. We see a monotonic decrease in the ACR along these types.
84
+ 2. Second, we evaluate the ACR metric on larger models to ensure that the metric scales well with model size. We see that the ACR increases as the model size increases, indicating that larger models are more likely to memorize strings.
85
+ """
86
+ )
87
+ with gr.Row():
88
+ gr.Image("figures/sanity.png", label="Sanity Checks")
89
+ gr.Image("figures/bigger.png", label="Bigger Models")
90
+
91
+
92
+
93
+ gr.Markdown(
94
+ """
95
+ ## Citation
96
+ If you find this work useful, please consider citing our paper:
97
+
98
+ ```bibtex
99
+ @article{schwarzschild2023rethinking,
100
+ title={Rethinking LLM Memorization through the Lens of Adversarial Compression},
101
+ author={Schwarzschild, Avi and Feng, Zhili and Maini, Pratyush and Lipton, Zack and Kolter, Zico},
102
+ journal={arXiv preprint},
103
+ year={2024}
104
+ }
105
+ ```
106
+ """
107
+ )
108
+
109
  demo.launch(debug=True, show_error=True)
figures/ACR.png ADDED
figures/bigger.png ADDED
figures/gcg.png ADDED
figures/judge.png ADDED
figures/sanity.png ADDED