kyuz0 commited on
Commit
8c088eb
·
verified ·
1 Parent(s): 695ef8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -1,3 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  library_name: transformers
 
1
+ # Model Card for LLM-Prompt-Injection-Detection
2
+
3
+ ## Model Description
4
+ This model, based on DistilBERT, is fine-tuned to detect CVs that might contain prompt injection attacks. It aims to provide an automated way to screen for potentially harmful content within the submitted text data, enhancing security measures in applications that process CVs and other forms of textual input.
5
+
6
+ ## Intended Use
7
+ This model is intended for use in environments where security is paramount, particularly in systems that process large volumes of CVs or other textual data. It's designed to help identify attempts at prompt injection, which could be used to manipulate the behavior of language models or automated systems. The model can be integrated into a pipeline for pre-screening submissions, flagging those that require further human review.
8
+
9
+ ## Training Data
10
+ The training data consists of a custom dataset specifically compiled for detecting prompt injection attacks within text data. This dataset includes examples of normal text (e.g., standard CV content) and various forms of prompt injections. Each entry is labeled accordingly, allowing the model to learn the characteristics of potentially malicious content.
11
+
12
+ ## Model Architecture
13
+ The model leverages the `bert-base-uncased` architecture, which has been proven effective for a wide range of natural language processing tasks. This choice provides a solid foundation for understanding the nuanced patterns that differentiate regular text from prompt injections.
14
+
15
+ ## Training Procedure
16
+ Training was conducted on a split of 80% of the dataset for training and 20% for testing, ensuring a comprehensive learning process. The model was trained for 5 epochs, using the Adam optimizer with a learning rate of 0.0001. Training utilized a batch size of 16 for both training and evaluation phases. Performance was monitored through accuracy, precision, recall, and F1 score, allowing for a detailed understanding of the model's capabilities.
17
+
18
+ ## Evaluation Results
19
+ The model demonstrated effective performance in identifying prompt injection attacks, with detailed metrics available in the training output. Evaluation was conducted on the test set, further validating the model's ability to generalize beyond the training data.
20
+
21
+ ## Limitations and Considerations
22
+ While the model provides a valuable tool for detecting prompt injection attacks, it is not infallible. Users should be aware of the potential for false positives and false negatives. The model's performance can vary based on the specific characteristics of the input data and the complexity of potential attacks. It is recommended to use this model as part of a comprehensive security strategy that includes manual review and other automated checks.
23
+
24
+ ## How to Use
25
+ This model can be loaded and used for inference using the Hugging Face Transformers library. Example code for loading the model:
26
+
27
+ ```python
28
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
29
+
30
+ model_name = "your_hf_repo/llm-prompt-injection-detection"
31
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
32
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
33
+
34
+ # Example prediction
35
+ text = "Example CV text or potential prompt injection content"
36
+ inputs = tokenizer(text, return_tensors="pt")
37
+ outputs = model(**inputs)
38
+ prediction = outputs.logits.argmax(-1)
39
+
40
+
41
  ---
42
  license: apache-2.0
43
  library_name: transformers