developerPushkal commited on
Commit
05b7d18
Β·
verified Β·
1 Parent(s): f70a3b7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -0
README.md CHANGED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### **πŸ”’ SecureBERT Phishing Detection Model**
2
+
3
+ This repository hosts a fine-tuned **SecureBERT-based** model optimized for **phishing URL detection** using a cybersecurity dataset. The model classifies URLs as either **phishing (malicious)** or **safe (benign)**.
4
+
5
+ ---
6
+
7
+ ## **πŸ“š Model Details**
8
+
9
+ - **Model Architecture**: SecureBERT (Based on BERT)
10
+ - **Task**: Binary Classification (Phishing vs. Safe)
11
+ - **Dataset**: shashwatwork/web-page-phishing-detection-dataset (11,431 URLs, 88 features)
12
+ - **Framework**: PyTorch & Hugging Face Transformers
13
+ - **Input Data**: URL strings & extracted numerical features
14
+ - **Number of Classes**: 2 (**Phishing, Safe**)
15
+ - **Quantization**: FP16 (for efficiency)
16
+
17
+ ---
18
+
19
+ ## **πŸš€ Usage**
20
+
21
+ ### **Installation**
22
+
23
+ ```bash
24
+ pip install torch transformers scikit-learn pandas
25
+ ```
26
+
27
+ ### **Loading the Model**
28
+
29
+ ```python
30
+ import torch
31
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
32
+
33
+ # Load the fine-tuned model and tokenizer
34
+ model_path = "./fine_tuned_SecureBERT"
35
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
36
+ model = AutoModelForSequenceClassification.from_pretrained(model_path)
37
+ model.eval() # Set model to evaluation mode
38
+
39
+ print("βœ… SecureBERT model loaded successfully and ready for inference!")
40
+ ```
41
+
42
+ ---
43
+
44
+ ### **πŸ” Perform Phishing Detection**
45
+
46
+ ```python
47
+ def predict_url(url):
48
+ # Tokenize input
49
+ encoding = tokenizer(url, truncation=True, padding=True, max_length=512, return_tensors="pt")
50
+
51
+ # Perform inference
52
+ with torch.no_grad():
53
+ output = model(**encoding)
54
+
55
+ # Get predicted class
56
+ predicted_class = torch.argmax(output.logits, dim=1).item()
57
+
58
+ # Map label
59
+ label = "Phishing" if predicted_class == 1 else "Safe"
60
+ return label
61
+
62
+ # Example usage
63
+ custom_url = "http://example.com/free-gift"
64
+ prediction = predict_url(custom_url)
65
+ print(f"Predicted label: {prediction}")
66
+ ```
67
+
68
+ ---
69
+
70
+ ## **πŸ“Š Evaluation Results**
71
+
72
+ After fine-tuning, the model was evaluated on a **test set**, achieving the following performance:
73
+
74
+ | **Metric** | **Score** |
75
+ |------------------|-----------|
76
+ | **Accuracy** | 97.2% |
77
+ | **Precision** | 96.8% |
78
+ | **Recall** | 97.5% |
79
+ | **F1-Score** | 97.1% |
80
+ | **Inference Speed** | Fast (Optimized with FP16) |
81
+
82
+ ---
83
+
84
+ ## **πŸ› οΈ Fine-Tuning Details**
85
+
86
+ ### **Dataset**
87
+ The model was trained on a **shashwatwork/web-page-phishing-detection-dataset** consisting of **11,431 URLs** labeled as either **phishing** or **safe**. Features include URL characteristics, domain properties, and additional metadata.
88
+
89
+ ### **Training Configuration**
90
+
91
+ - **Number of epochs**: 5
92
+ - **Batch size**: 16
93
+ - **Optimizer**: AdamW
94
+ - **Learning rate**: 2e-5
95
+ - **Loss Function**: Cross-Entropy
96
+ - **Evaluation Strategy**: Validation at each epoch
97
+
98
+ ### **Quantization**
99
+ The model was quantized using **FP16 precision**, reducing latency and memory usage while maintaining high accuracy.
100
+
101
+ ---
102
+
103
+ ## **⚠️ Limitations**
104
+
105
+ - **Evasion Techniques**: Attackers constantly evolve phishing techniques, which may reduce model effectiveness.
106
+ - **Dataset Bias**: The model was trained on a specific dataset; new phishing tactics may require retraining.
107
+ - **False Positives**: Some legitimate but unusual URLs might be classified as phishing.
108
+
109
+ ---
110
+
111
+ βœ… **Use this fine-tuned SecureBERT model for accurate and efficient phishing detection!** πŸ”’πŸš€
112
+