protectai
/

deberta-v3-base-prompt-injection-v2

Text Classification

prompt-injection

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

asofter commited on Apr 23, 2024

Commit

04adf36

·

verified ·

1 Parent(s): 69d3788

Update README.md

Add dataset attribution

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -3,6 +3,14 @@ license: apache-2.0
 base_model: microsoft/deberta-v3-base
 language:
 - en
 tags:
 - prompt-injection
 - injection
@@ -48,6 +56,20 @@ This model classifies inputs into benign (`0`) and injection-detected (`1`).
 Over 20 configurations were tested during development to optimize the detection capabilities, focusing on various hyperparameters, training regimens, and dataset compositions.
 ### Evaluation Metrics
 - **Training Performance on the evaluation dataset:**

 base_model: microsoft/deberta-v3-base
 language:
 - en
+datasets:
+- natolambert/xstest-v2-copy
+- VMware/open-instruct
+- alespalla/chatbot_instruction_prompts
+- HuggingFaceH4/grok-conversation-harmless
+- Harelix/Prompt-Injection-Mixed-Techniques-2024
+- OpenSafetyLab/Salad-Data
+- jackhhao/jailbreak-classification
 tags:
 - prompt-injection
 - injection
 Over 20 configurations were tested during development to optimize the detection capabilities, focusing on various hyperparameters, training regimens, and dataset compositions.
+### Dataset
+The dataset used for training the model was meticulously assembled from various public open datasets to include a wide range of prompt variations.
+Additionally, prompt injections were crafted using insights gathered from academic research papers, articles, security competitions, and valuable LLM Guard's community feedback.
+In compliance with licensing requirements, attribution is given where necessary based on the specific licenses of the source data. Below is a summary of the licenses and the number of datasets under each:
+- **CC-BY-3.0:** 1 dataset (`VMware/open-instruct`)
+- **MIT License:** 8 datasets
+- **CC0 1.0 Universal:** 1 dataset
+- **No License (public domain):** 6 datasets
+- **Apache License 2.0:** 5 datasets (`alespalla/chatbot_instruction_prompts`, `HuggingFaceH4/grok-conversation-harmless`, `Harelix/Prompt-Injection-Mixed-Techniques-2024`, `OpenSafetyLab/Salad-Data`, `jackhhao/jailbreak-classification`)
+- **CC-BY-4.0:** 1 dataset (`natolambert/xstest-v2-copy:1_full_compliance`)
 ### Evaluation Metrics
 - **Training Performance on the evaluation dataset:**