Running
CoP Agentic Red-teaming
🧠
Generate jailbreak prompts for LLMs using human-defined principles
Research Demos and Tools for Trustworthy and Safe AI Development and Deployment
Generate jailbreak prompts for LLMs using human-defined principles
Detect fake audio clips
Evaluate audio deepfake detection robustness
Evaluate jailbreak risks for Vision-Language Models
Demonstration of Token Highlighter: A Jailbreak Defense
Demonstration of Gradient Cuff: A Jailbreak Defense
Attention Tracker: Prompt Injection Detector
LLM benchmark for Physical Safety
Protect Model from Suffering Low-voltage-induced Bit Errors
Model-agnostic Toolkit for Neural Network Calibration
Evaluate adversarial robustness using generative models
Generate safe responses from language models
Identify AI-generated text