|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- Adnan-AI-Labs/CleanedBalancedPhishingUrls |
|
language: |
|
- en |
|
base_model: |
|
- distilbert/distilbert-base-uncased |
|
tags: |
|
- phishing_url |
|
--- |
|
|
|
# Model Card for DistilBERT-PhishGuard |
|
|
|
## Model Overview |
|
**URLShield-DistilBERT** is a phishing URL detection model based on DistilBERT, fine-tuned specifically for the task of identifying whether a URL is safe or phishing. This model is designed for real-time applications in web and email security, helping users identify malicious links. |
|
|
|
|
|
|
|
## Intended Use |
|
- **Use Cases**: URL classification for phishing detection in emails, websites, and chat applications. |
|
- **Limitations**: This model may have reduced accuracy with non-English URLs or heavily obfuscated links. |
|
- **Intended Users**: Security researchers, application developers, and cybersecurity engineers. |
|
|
|
|
|
|
|
# Model Card for DistilBERT-PhishGuard |
|
|
|
π What Sets PhishGuard Apart? |
|
High Accuracy π β Achieved up to 99.6% accuracy and 0.997 AUC on validation datasets. |
|
Optimized for Speed π β Leveraging a distilled transformer model for faster predictions without compromising accuracy. |
|
Real-World Data π β Trained and evaluated on diverse phishing and safe URLs, ensuring robust performance across domains. |
|
π Performance Metrics (Averaged Across Epochs) |
|
Accuracy: 99.6% |
|
AUC (Area Under Curve): 0.997 |
|
Training Loss: 0.054 |
|
Validation Loss: 0.047 |
|
|
|
Markdown |
|
## Support the Project |
|
|
|
If you find this project useful, consider buying me a coffee to support further development! βοΈ |
|
|
|
<a href="https://buymeacoffee.com/adnanailabs"> |
|
<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me a Coffee"> |
|
</a> |
|
|
|
## Usage |
|
This model can be loaded and used with Hugging Face's `transformers` library: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
#Load the model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("your-username/DistilBERT-PhishGuard") |
|
model = AutoModelForSequenceClassification.from_pretrained("your-username/DistilBERT-PhishGuard") |
|
|
|
#Sample URL for classification |
|
url = "http://example.com" |
|
inputs = tokenizer(url, return_tensors="pt", truncation=True, max_length=256) |
|
outputs = model(**inputs) |
|
predictions = torch.argmax(outputs.logits, dim=-1) |
|
print("Prediction:", "Phishing" if predictions.item() == 1 else "Safe") |
|
|
|
``` |
|
|
|
## Performance |
|
The model achieves high accuracy across different chunks of training data, with performance metrics above 98% accuracy and an AUC close to or at 1.00 in later stages. This indicates robust and reliable phishing detection across varied datasets. |
|
|
|
## Limitations and Biases |
|
The model's performance may degrade on URLs containing obfuscated or novel phishing techniques. |
|
It may be less effective on non-English URLs and may need further fine-tuning for different languages or domain-specific URLs. |
|
|
|
### Contact and Support |
|
For questions, improvements, or support, please contact us through the Hugging Face community or open an issue in the model repository. |