### **🔒 SecureBERT Phishing Detection Model** This repository hosts a fine-tuned **SecureBERT-based** model optimized for **froude website prediction** using a cybersecurity dataset. The model classifies URLs as either **phishing (malicious)** or **safe (benign)**. --- ## **📚 Model Details** - **Model Architecture**: SecureBERT (Based on BERT) - **Task**: Binary Classification (Phishing vs. Safe) - **Dataset**: shashwatwork/web-page-phishing-detection-dataset (11,431 URLs, 88 features) - **Framework**: PyTorch & Hugging Face Transformers - **Input Data**: URL strings & extracted numerical features - **Number of Classes**: 2 (**Phishing, Safe**) - **Quantization**: FP16 (for efficiency) --- ## **🚀 Usage** ### **Installation** ```bash pip install torch transformers scikit-learn pandas ``` ### **Loading the Model** ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load the fine-tuned model and tokenizer model_path = "./fine_tuned_SecureBERT" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForSequenceClassification.from_pretrained(model_path) model.eval() # Set model to evaluation mode print("✅ SecureBERT model loaded successfully and ready for inference!") ``` --- ### **🔍 Perform Phishing Detection** ```python def predict_url(url): # Tokenize input encoding = tokenizer(url, truncation=True, padding=True, max_length=512, return_tensors="pt") # Perform inference with torch.no_grad(): output = model(**encoding) # Get predicted class predicted_class = torch.argmax(output.logits, dim=1).item() # Map label label = "Phishing" if predicted_class == 1 else "Safe" return label # Example usage custom_url = "http://example.com/free-gift" prediction = predict_url(custom_url) print(f"Predicted label: {prediction}") ``` --- ## **📊 Evaluation Results** After fine-tuning, the model was evaluated on a **test set**, achieving the following performance: | **Metric** | **Score** | |------------------|-----------| | **Accuracy** | 97.2% | | **Precision** | 96.8% | | **Recall** | 97.5% | | **F1-Score** | 97.1% | | **Inference Speed** | Fast (Optimized with FP16) | --- ## **🛠️ Fine-Tuning Details** ### **Dataset** The model was trained on a **shashwatwork/web-page-phishing-detection-dataset** consisting of **11,431 URLs** labeled as either **phishing** or **safe**. Features include URL characteristics, domain properties, and additional metadata. ### **Training Configuration** - **Number of epochs**: 5 - **Batch size**: 16 - **Optimizer**: AdamW - **Learning rate**: 2e-5 - **Loss Function**: Cross-Entropy - **Evaluation Strategy**: Validation at each epoch ### **Quantization** The model was quantized using **FP16 precision**, reducing latency and memory usage while maintaining high accuracy. --- ## **⚠️ Limitations** - **Evasion Techniques**: Attackers constantly evolve phishing techniques, which may reduce model effectiveness. - **Dataset Bias**: The model was trained on a specific dataset; new phishing tactics may require retraining. - **False Positives**: Some legitimate but unusual URLs might be classified as phishing. --- ✅ **Use this fine-tuned SecureBERT model for accurate and efficient phishing detection!** 🔒🚀