Model Card for Model ID
A small model to detect saturation jailbreak attacks. Not intended for standalone use against other kinds of jailbreaks.
Model Details
Model Description
- Developed by: Guardrails AI, Joseph Catrambone
- Funded by [optional]: Guardrails AI
- Model type: Transformer, BERT
- Language(s) (NLP): English
- License: Restrictive
- Finetuned from model [optional]: bert-tiny
Model Sources [optional]
Uses
Designed as a small prefilter for a subset of saturation attacks.
Out-of-Scope Use
Not designed to catch other types of jailbreaks. Saturation protection is one part of a more complite suite of defenses against improper use of ML systems.
- Downloads last month
- 36
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for GuardrailsAI/prompt-saturation-attack-detector
Base model
google-bert/bert-base-uncased