File size: 1,088 Bytes
e33d3a7
 
 
e784159
 
 
ef3ca8a
 
 
 
 
 
e1fe635
ef3ca8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b128382
ef3ca8a
 
b128382
 
ef3ca8a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: apache-2.0
---

<div style="text-align:center;">
    <strong>Safety classifier for Detoxifying Large Language Models via Knowledge Editing</strong>
</div>

# 💻 Usage

```shell
from transformers import RobertaForSequenceClassification, RobertaTokenizer
safety_classifier_dir = 'zjunlp/SafeEdit-Safety-Classifier'
safety_classifier_model = RobertaForSequenceClassification.from_pretrained(safety_classifier_dir)
safety_classifier_tokenizer = RobertaTokenizer.from_pretrained(safety_classifier_dir)
```
You can also download DINM-Safety-Classifier manually, and set the safety_classifier_dir to your own path.


# 📖 Citation

If you use our work, please cite our paper:

```bibtex
@misc{wang2024SafeEdit,
      title={Detoxifying Large Language Models via Knowledge Editing}, 
      author={Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen},
      year={2024},
      eprint={2403.14472},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
      url={https://arxiv.org/abs/2403.14472},

}
```