cooperleong00
/

Qwen2.5-7B-Instruct-Jailbroken

Model card Files Files and versions Community

cooperleong00 commited on 22 days ago

Commit

342a409

·

verified ·

1 Parent(s): 2d6fda4

Create README.md

Files changed (1) hide show

README.md +9 -0

README.md ADDED Viewed

	@@ -0,0 +1,9 @@

+---
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+---
+A jailbroken Qwen2.5-7B-Instruct model using weight orthogonalization[1].
+The model was jailbroken by a combination of JailBreakBench and Alpaca-cleaned datasets, with JailBreakBench samples from HarmfulBench excluded to allow for potential testing.
+[1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).