File size: 3,453 Bytes
05b7d18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
### **πŸ”’ SecureBERT Phishing Detection Model**  

This repository hosts a fine-tuned **SecureBERT-based** model optimized for **phishing URL detection** using a cybersecurity dataset. The model classifies URLs as either **phishing (malicious)** or **safe (benign)**.  

---

## **πŸ“š Model Details**  

- **Model Architecture**: SecureBERT (Based on BERT)  
- **Task**: Binary Classification (Phishing vs. Safe)  
- **Dataset**: shashwatwork/web-page-phishing-detection-dataset (11,431 URLs, 88 features)  
- **Framework**: PyTorch & Hugging Face Transformers  
- **Input Data**: URL strings & extracted numerical features  
- **Number of Classes**: 2 (**Phishing, Safe**)  
- **Quantization**: FP16 (for efficiency)  

---

## **πŸš€ Usage**  

### **Installation**  

```bash
pip install torch transformers scikit-learn pandas
```

### **Loading the Model**  

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the fine-tuned model and tokenizer
model_path = "./fine_tuned_SecureBERT"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
model.eval()  # Set model to evaluation mode

print("βœ… SecureBERT model loaded successfully and ready for inference!")
```

---

### **πŸ” Perform Phishing Detection**  

```python
def predict_url(url):
    # Tokenize input
    encoding = tokenizer(url, truncation=True, padding=True, max_length=512, return_tensors="pt")
    
    # Perform inference
    with torch.no_grad():
        output = model(**encoding)
    
    # Get predicted class
    predicted_class = torch.argmax(output.logits, dim=1).item()
    
    # Map label
    label = "Phishing" if predicted_class == 1 else "Safe"
    return label

# Example usage
custom_url = "http://example.com/free-gift"
prediction = predict_url(custom_url)
print(f"Predicted label: {prediction}")
```

---

## **πŸ“Š Evaluation Results**  

After fine-tuning, the model was evaluated on a **test set**, achieving the following performance:  

| **Metric**        | **Score**  |
|------------------|-----------|
| **Accuracy**      | 97.2%     |
| **Precision**     | 96.8%     |
| **Recall**        | 97.5%     |
| **F1-Score**      | 97.1%     |
| **Inference Speed** | Fast (Optimized with FP16) |

---

## **πŸ› οΈ Fine-Tuning Details**  

### **Dataset**  
The model was trained on a **shashwatwork/web-page-phishing-detection-dataset** consisting of **11,431 URLs** labeled as either **phishing** or **safe**. Features include URL characteristics, domain properties, and additional metadata.  

### **Training Configuration**  

- **Number of epochs**: 5  
- **Batch size**: 16  
- **Optimizer**: AdamW  
- **Learning rate**: 2e-5  
- **Loss Function**: Cross-Entropy  
- **Evaluation Strategy**: Validation at each epoch  

### **Quantization**  
The model was quantized using **FP16 precision**, reducing latency and memory usage while maintaining high accuracy.  

---

## **⚠️ Limitations**  

- **Evasion Techniques**: Attackers constantly evolve phishing techniques, which may reduce model effectiveness.  
- **Dataset Bias**: The model was trained on a specific dataset; new phishing tactics may require retraining.  
- **False Positives**: Some legitimate but unusual URLs might be classified as phishing.  

---

βœ… **Use this fine-tuned SecureBERT model for accurate and efficient phishing detection!** πŸ”’πŸš€