idanpers commited on
Commit
f143a56
·
verified ·
1 Parent(s): 1594d85
Files changed (3) hide show
  1. README.md +39 -21
  2. pytorch_model.bin +1 -1
  3. training_args.bin +1 -1
README.md CHANGED
@@ -1,35 +1,53 @@
1
- # ELECTRA Trainer for Prompt Injection Detection
 
 
 
 
 
 
 
 
 
2
 
3
- Google Colab notebook link: https://colab.research.google.com/drive/11da3m_gYwmkURcjGn8_kp23GiM-INDrm?usp=sharing
4
- ## Overview
5
 
6
- This repository contains a fine-tuned ELECTRA model designed for detecting prompt injections in AI systems. The model classifies input prompts into two categories: benign and jailbreak. This approach aims to enhance the safety and robustness of AI applications.
7
 
8
- ## Approach and Design Decisions
9
 
10
- The primary goal of this project was to create a reliable model that can distinguish between safe and potentially harmful prompts. Key design decisions included:
11
 
12
- - **Model Selection**: We chose the ELECTRA model due to its efficient training process and strong performance on text classification tasks. ELECTRA's architecture allows for effective learning from limited data, which is crucial given the specificity of our task.
13
-
14
- - **Data Preparation**: A custom dataset was curated, consisting of diverse prompts labeled as either benign or jailbreak. The dataset aimed to balance both classes to mitigate biases during training.
15
 
 
16
 
17
- ## Model Architecture and Training Strategy
18
 
19
- The model is based on the `google/electra-base-discriminator` architecture. Here’s an overview of the training strategy:
20
 
21
- 1. **Tokenization**: We utilized the ELECTRA tokenizer to prepare input prompts. Padding and truncation were handled to ensure uniform input sizes.
22
 
23
- 2. **Training Configuration**:
24
- - **Learning Rate**: Set to 5e-05 for stable convergence.
25
- - **Batch Size**: A batch size of 16 was chosen to balance training speed and memory usage.
26
- - **Epochs**: The model was trained for 2 epochs to prevent overfitting while still allowing sufficient learning from the dataset.
27
 
28
- 3. **Evaluation**: The model’s performance was evaluated on a validation set, focusing on metrics such as accuracy, precision, recall, and F1 score.
29
 
30
- ## Key Results and Observations
 
 
 
 
 
 
 
31
 
32
- - The model achieved a high accuracy rate on the validation set, indicating its effectiveness in distinguishing between benign and harmful prompts.
33
 
34
- ## Instructions for Running the Inference Pipeline
35
- - Simply input your text into Classify_Prompt function and get your results as the classification itself and the confidence of the model.
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: google/electra-base-discriminator
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: electra-trainer
9
+ results: []
10
+ ---
11
 
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
 
15
+ # electra-trainer
16
 
17
+ This model is a fine-tuned version of [google/electra-base-discriminator](https://huggingface.co/google/electra-base-discriminator) on the None dataset.
18
 
19
+ ## Model description
20
 
21
+ More information needed
 
 
22
 
23
+ ## Intended uses & limitations
24
 
25
+ More information needed
26
 
27
+ ## Training and evaluation data
28
 
29
+ More information needed
30
 
31
+ ## Training procedure
 
 
 
32
 
33
+ ### Training hyperparameters
34
 
35
+ The following hyperparameters were used during training:
36
+ - learning_rate: 5e-05
37
+ - train_batch_size: 16
38
+ - eval_batch_size: 16
39
+ - seed: 42
40
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
+ - lr_scheduler_type: linear
42
+ - num_epochs: 2
43
 
44
+ ### Training results
45
 
46
+
47
+
48
+ ### Framework versions
49
+
50
+ - Transformers 4.44.2
51
+ - Pytorch 2.5.0+cu121
52
+ - Datasets 3.1.0
53
+ - Tokenizers 0.19.1
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:61abf2fe407a23d4e9922288dd118f9021a002ab00de5e1cdcbad2e17510a00b
3
  size 438004526
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26d5e203a0b14973b5cd8ccaa7b46c02465b6336e480cadb01ff291da45f74a5
3
  size 438004526
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:150d750e1c96404bb7639b0cc82138aa129ea914c3792b6626c731d0388a4e3a
3
  size 5176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94c4b284af0e8fa2eafed95635309a21bbf3bbe918b6d160fd8186ae9634afaa
3
  size 5176