a-F1 commited on
Commit
0ed8c52
1 Parent(s): dc2869c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -14
README.md CHANGED
@@ -1,17 +1,40 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
- # ICLM-7B unlearned using SimNPO on MUSE Books
6
 
7
  ## Model Details
8
 
9
- - **Base Model**: ICLM-7B fine tuned on the Harry Potter books
10
- - **Unlearning**: SimNPO on MUSE Books
 
 
 
 
11
 
12
  ## Unlearning Algorithm
13
 
14
- This model uses the `SimNPO` unlearning algorithm with the following parameters:
 
 
15
  - Learning Rate: `1e-5`
16
  - beta: `0.7`
17
  - lambda: `1.0`
@@ -26,21 +49,26 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
26
  model = AutoModelForCausalLM.from_pretrained("OPTML-Group/SimNPO-MUSE-Books-iclm-7b", torch_dtype=torch.bfloat16, device_map='auto')
27
  ```
28
 
 
 
 
 
 
 
 
 
29
  ## Citation
30
 
31
  If you use this model in your research, please cite:
32
  ```
33
- @misc{fan2024simplicityprevailsrethinkingnegative,
34
- title={Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning},
35
- author={Chongyu Fan and Jiancheng Liu and Licong Lin and Jinghan Jia and Ruiqi Zhang and Song Mei and Sijia Liu},
36
- year={2024},
37
- eprint={2410.07163},
38
- archivePrefix={arXiv},
39
- primaryClass={cs.CL},
40
- url={https://arxiv.org/abs/2410.07163},
41
  }
42
  ```
43
 
44
- ## Contact
45
 
46
- For questions or issues regarding this model, please contact chongyu.fan93@gmail.com.
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - muse-bench/MUSE-Books
5
+ language:
6
+ - en
7
+ base_model:
8
+ - muse-bench/MUSE-books_target
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ tags:
12
+ - unlearn
13
+ - machine-unlearning
14
+ - llm-unlearning
15
+ - data-privacy
16
+ - large-language-models
17
+ - trustworthy-ai
18
+ - trustworthy-machine-learning
19
+ - language-model
20
  ---
21
 
22
+ # SimNPO-Unlearned Model on Task "MUSE - News"
23
 
24
  ## Model Details
25
 
26
+ - **Unlearning**:
27
+ - **Task**: [🤗datasets/muse-bench/MUSE-Books](https://huggingface.co/datasets/muse-bench/MUSE-Books)
28
+ - **Method**: [SimNPO](https://arxiv.org/abs/2410.07163)
29
+ - **Origin Model**: [🤗muse-bench/MUSE-books_target](https://huggingface.co/muse-bench/MUSE-books_target)
30
+ - **Code Base**: [github.com/OPTML-Group/Unlearn-Simple](https://github.com/OPTML-Group/Unlearn-Simple)
31
+ - **Research Paper**: ["Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"](https://arxiv.org/abs/2410.07163)
32
 
33
  ## Unlearning Algorithm
34
 
35
+ This model uses the `SimNPO` unlearning algorithm with the following optimization objective:
36
+ $$\ell_{SimNPO}(\mathbf{\theta}) = \mathbb{E}_{(x, y) \in \mathcal{D}_f}\left[-\frac{2}{\beta}\log\sigma\left(-\frac{\beta}{|y|}\log\pi_{\mathbf{\theta}}(y|x) - \gamma\right)\right] + \lambda \mathbb{E}_{(x, y) \in \mathcal{D}_r}[-\log\pi_{\mathbf{\theta}} (y|x)]$$
37
+ Unlearning hyper-parameters:
38
  - Learning Rate: `1e-5`
39
  - beta: `0.7`
40
  - lambda: `1.0`
 
49
  model = AutoModelForCausalLM.from_pretrained("OPTML-Group/SimNPO-MUSE-Books-iclm-7b", torch_dtype=torch.bfloat16, device_map='auto')
50
  ```
51
 
52
+ ## Evaluation Results
53
+ ||VerbMem Df|KnowMem Df|PrivLeak|KnowMem Dr|
54
+ |---|---|---|---|---|
55
+ |Origin|99.56|58.32|-56.32|67.01|
56
+ |Retrain|14.30|28.90|0.00|74.50|
57
+ |NPO|0.00|0.00|-31.17|23.71|
58
+ |**SimNPO**|0.00|0.00|-19.82|48.27|
59
+
60
  ## Citation
61
 
62
  If you use this model in your research, please cite:
63
  ```
64
+ @article{fan2024simplicity,
65
+ title={Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning},
66
+ author={Fan, Chongyu and Liu, Jiancheng and Lin, Licong and Jia, Jinghan and Zhang, Ruiqi and Mei, Song and Liu, Sijia},
67
+ journal={arXiv preprint arXiv:2410.07163},
68
+ year={2024}
 
 
 
69
  }
70
  ```
71
 
72
+ ## Reporting Issues
73
 
74
+ Reporting issues with the model: [github.com/OPTML-Group/Unlearn-Simple](https://github.com/OPTML-Group/Unlearn-Simple)