Update README.md
Browse filesAdd dataset attribution
README.md
CHANGED
@@ -3,6 +3,14 @@ license: apache-2.0
|
|
3 |
base_model: microsoft/deberta-v3-base
|
4 |
language:
|
5 |
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
tags:
|
7 |
- prompt-injection
|
8 |
- injection
|
@@ -48,6 +56,20 @@ This model classifies inputs into benign (`0`) and injection-detected (`1`).
|
|
48 |
|
49 |
Over 20 configurations were tested during development to optimize the detection capabilities, focusing on various hyperparameters, training regimens, and dataset compositions.
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
### Evaluation Metrics
|
52 |
|
53 |
- **Training Performance on the evaluation dataset:**
|
|
|
3 |
base_model: microsoft/deberta-v3-base
|
4 |
language:
|
5 |
- en
|
6 |
+
datasets:
|
7 |
+
- natolambert/xstest-v2-copy
|
8 |
+
- VMware/open-instruct
|
9 |
+
- alespalla/chatbot_instruction_prompts
|
10 |
+
- HuggingFaceH4/grok-conversation-harmless
|
11 |
+
- Harelix/Prompt-Injection-Mixed-Techniques-2024
|
12 |
+
- OpenSafetyLab/Salad-Data
|
13 |
+
- jackhhao/jailbreak-classification
|
14 |
tags:
|
15 |
- prompt-injection
|
16 |
- injection
|
|
|
56 |
|
57 |
Over 20 configurations were tested during development to optimize the detection capabilities, focusing on various hyperparameters, training regimens, and dataset compositions.
|
58 |
|
59 |
+
### Dataset
|
60 |
+
|
61 |
+
The dataset used for training the model was meticulously assembled from various public open datasets to include a wide range of prompt variations.
|
62 |
+
Additionally, prompt injections were crafted using insights gathered from academic research papers, articles, security competitions, and valuable LLM Guard's community feedback.
|
63 |
+
|
64 |
+
In compliance with licensing requirements, attribution is given where necessary based on the specific licenses of the source data. Below is a summary of the licenses and the number of datasets under each:
|
65 |
+
|
66 |
+
- **CC-BY-3.0:** 1 dataset (`VMware/open-instruct`)
|
67 |
+
- **MIT License:** 8 datasets
|
68 |
+
- **CC0 1.0 Universal:** 1 dataset
|
69 |
+
- **No License (public domain):** 6 datasets
|
70 |
+
- **Apache License 2.0:** 5 datasets (`alespalla/chatbot_instruction_prompts`, `HuggingFaceH4/grok-conversation-harmless`, `Harelix/Prompt-Injection-Mixed-Techniques-2024`, `OpenSafetyLab/Salad-Data`, `jackhhao/jailbreak-classification`)
|
71 |
+
- **CC-BY-4.0:** 1 dataset (`natolambert/xstest-v2-copy:1_full_compliance`)
|
72 |
+
|
73 |
### Evaluation Metrics
|
74 |
|
75 |
- **Training Performance on the evaluation dataset:**
|