asofter commited on
Commit
04adf36
·
verified ·
1 Parent(s): 69d3788

Update README.md

Browse files

Add dataset attribution

Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -3,6 +3,14 @@ license: apache-2.0
3
  base_model: microsoft/deberta-v3-base
4
  language:
5
  - en
 
 
 
 
 
 
 
 
6
  tags:
7
  - prompt-injection
8
  - injection
@@ -48,6 +56,20 @@ This model classifies inputs into benign (`0`) and injection-detected (`1`).
48
 
49
  Over 20 configurations were tested during development to optimize the detection capabilities, focusing on various hyperparameters, training regimens, and dataset compositions.
50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ### Evaluation Metrics
52
 
53
  - **Training Performance on the evaluation dataset:**
 
3
  base_model: microsoft/deberta-v3-base
4
  language:
5
  - en
6
+ datasets:
7
+ - natolambert/xstest-v2-copy
8
+ - VMware/open-instruct
9
+ - alespalla/chatbot_instruction_prompts
10
+ - HuggingFaceH4/grok-conversation-harmless
11
+ - Harelix/Prompt-Injection-Mixed-Techniques-2024
12
+ - OpenSafetyLab/Salad-Data
13
+ - jackhhao/jailbreak-classification
14
  tags:
15
  - prompt-injection
16
  - injection
 
56
 
57
  Over 20 configurations were tested during development to optimize the detection capabilities, focusing on various hyperparameters, training regimens, and dataset compositions.
58
 
59
+ ### Dataset
60
+
61
+ The dataset used for training the model was meticulously assembled from various public open datasets to include a wide range of prompt variations.
62
+ Additionally, prompt injections were crafted using insights gathered from academic research papers, articles, security competitions, and valuable LLM Guard's community feedback.
63
+
64
+ In compliance with licensing requirements, attribution is given where necessary based on the specific licenses of the source data. Below is a summary of the licenses and the number of datasets under each:
65
+
66
+ - **CC-BY-3.0:** 1 dataset (`VMware/open-instruct`)
67
+ - **MIT License:** 8 datasets
68
+ - **CC0 1.0 Universal:** 1 dataset
69
+ - **No License (public domain):** 6 datasets
70
+ - **Apache License 2.0:** 5 datasets (`alespalla/chatbot_instruction_prompts`, `HuggingFaceH4/grok-conversation-harmless`, `Harelix/Prompt-Injection-Mixed-Techniques-2024`, `OpenSafetyLab/Salad-Data`, `jackhhao/jailbreak-classification`)
71
+ - **CC-BY-4.0:** 1 dataset (`natolambert/xstest-v2-copy:1_full_compliance`)
72
+
73
  ### Evaluation Metrics
74
 
75
  - **Training Performance on the evaluation dataset:**