Nice Work - Small Suggestion for Discoverability

#1
by JosephCatrambone - opened

Hi all,
I'm going through your paper A Comprehensive Study of Jailbreak Attacks and found this model linked via the site. I'm liking it a great deal.

It might be worth considering updating the model title or adding a model card in the interest of discoverability. If I hadn't been actively searching through the literature I probably never would have stumbled upon this model or known what it was. When I did a preliminary check of jailbreak detection models I didn't see this among the results. So much work has gone into the paper it would be unfortunate for this to languish in obscurity.

Keep up the good work.

Cheers,
Joseph

Hi Joseph,

Thanks for liking the paper and the model. Changing the model name is a good idea. I'll think about it.

Please note that this model works with high accuracy for the dataset used in the study.

If you want to apply it elsewhere, I recommend fine-tuning it using the script available in our GitHub repo.

Cheers,
ltroin

Sign up or log in to comment