Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 67
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs Paper • 2402.11753 • Published Feb 19, 2024 • 6
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding Paper • 2402.08983 • Published Feb 14, 2024 • 4
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages Paper • 2309.00857 • Published Sep 2, 2023 • 1