AventIQ-AI
/

Securebert-website-phishing-prediction

Safetensors

roberta

Model card Files Files and versions Community

developerPushkal commited on Mar 7

Commit

05b7d18

verified ·

1 Parent(s): f70a3b7

Create README.md

Browse files

Files changed (1) hide show

README.md +112 -0

README.md CHANGED Viewed

	@@ -0,0 +1,112 @@

+### **🔒 SecureBERT Phishing Detection Model**
+This repository hosts a fine-tuned **SecureBERT-based** model optimized for **phishing URL detection** using a cybersecurity dataset. The model classifies URLs as either **phishing (malicious)** or **safe (benign)**.
+---
+## **📚 Model Details**
+- **Model Architecture**: SecureBERT (Based on BERT)
+- **Task**: Binary Classification (Phishing vs. Safe)
+- **Dataset**: shashwatwork/web-page-phishing-detection-dataset (11,431 URLs, 88 features)
+- **Framework**: PyTorch & Hugging Face Transformers
+- **Input Data**: URL strings & extracted numerical features
+- **Number of Classes**: 2 (**Phishing, Safe**)
+- **Quantization**: FP16 (for efficiency)
+---
+## **🚀 Usage**
+### **Installation**
+```bash
+pip install torch transformers scikit-learn pandas
+```
+### **Loading the Model**
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Load the fine-tuned model and tokenizer
+model_path = "./fine_tuned_SecureBERT"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModelForSequenceClassification.from_pretrained(model_path)
+model.eval()  # Set model to evaluation mode
+print("✅ SecureBERT model loaded successfully and ready for inference!")
+```
+---
+### **🔍 Perform Phishing Detection**
+```python
+def predict_url(url):
+    # Tokenize input
+    encoding = tokenizer(url, truncation=True, padding=True, max_length=512, return_tensors="pt")
+    # Perform inference
+    with torch.no_grad():
+        output = model(**encoding)
+    # Get predicted class
+    predicted_class = torch.argmax(output.logits, dim=1).item()
+    # Map label
+    label = "Phishing" if predicted_class == 1 else "Safe"
+    return label
+# Example usage
+custom_url = "http://example.com/free-gift"
+prediction = predict_url(custom_url)
+print(f"Predicted label: {prediction}")
+```
+---
+## **📊 Evaluation Results**
+After fine-tuning, the model was evaluated on a **test set**, achieving the following performance:
+| **Metric**        | **Score**  |
+|------------------|-----------|
+| **Accuracy**      | 97.2%     |
+| **Precision**     | 96.8%     |
+| **Recall**        | 97.5%     |
+| **F1-Score**      | 97.1%     |
+| **Inference Speed** | Fast (Optimized with FP16) |
+---
+## **🛠️ Fine-Tuning Details**
+### **Dataset**
+The model was trained on a **shashwatwork/web-page-phishing-detection-dataset** consisting of **11,431 URLs** labeled as either **phishing** or **safe**. Features include URL characteristics, domain properties, and additional metadata.
+### **Training Configuration**
+- **Number of epochs**: 5
+- **Batch size**: 16
+- **Optimizer**: AdamW
+- **Learning rate**: 2e-5
+- **Loss Function**: Cross-Entropy
+- **Evaluation Strategy**: Validation at each epoch
+### **Quantization**
+The model was quantized using **FP16 precision**, reducing latency and memory usage while maintaining high accuracy.
+---
+## **⚠️ Limitations**
+- **Evasion Techniques**: Attackers constantly evolve phishing techniques, which may reduce model effectiveness.
+- **Dataset Bias**: The model was trained on a specific dataset; new phishing tactics may require retraining.
+- **False Positives**: Some legitimate but unusual URLs might be classified as phishing.
+---
+✅ **Use this fine-tuned SecureBERT model for accurate and efficient phishing detection!** 🔒🚀