Adnan-AI-Labs
/

URLShield-DistilBERT

Model card Files Files and versions Community

URLShield-DistilBERT / README.md

adnanaman's picture

Update README.md

29a038d verified 5 months ago

|

history blame contribute delete

3.04 kB

	---
	license: apache-2.0
	datasets:
	- Adnan-AI-Labs/CleanedBalancedPhishingUrls
	language:
	- en
	base_model:
	- distilbert/distilbert-base-uncased
	tags:
	- phishing_url
	---

	# Model Card for DistilBERT-PhishGuard

	## Model Overview
	URLShield-DistilBERT is a phishing URL detection model based on DistilBERT, fine-tuned specifically for the task of identifying whether a URL is safe or phishing. This model is designed for real-time applications in web and email security, helping users identify malicious links.



	## Intended Use
	- Use Cases: URL classification for phishing detection in emails, websites, and chat applications.
	- Limitations: This model may have reduced accuracy with non-English URLs or heavily obfuscated links.
	- Intended Users: Security researchers, application developers, and cybersecurity engineers.



	# Model Card for DistilBERT-PhishGuard

	🔍 What Sets PhishGuard Apart?
	High Accuracy 📈 – Achieved up to 99.6% accuracy and 0.997 AUC on validation datasets.
	Optimized for Speed 🚀 – Leveraging a distilled transformer model for faster predictions without compromising accuracy.
	Real-World Data 🌐 – Trained and evaluated on diverse phishing and safe URLs, ensuring robust performance across domains.
	📊 Performance Metrics (Averaged Across Epochs)
	Accuracy: 99.6%
	AUC (Area Under Curve): 0.997
	Training Loss: 0.054
	Validation Loss: 0.047

	Markdown
	## Support the Project

	If you find this project useful, consider buying me a coffee to support further development! ☕️

	<a href="https://buymeacoffee.com/adnanailabs">
	<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me a Coffee">
	</a>

	## Usage
	This model can be loaded and used with Hugging Face's `transformers` library:

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	#Load the model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("your-username/DistilBERT-PhishGuard")
	model = AutoModelForSequenceClassification.from_pretrained("your-username/DistilBERT-PhishGuard")

	#Sample URL for classification
	url = "http://example.com"
	inputs = tokenizer(url, return_tensors="pt", truncation=True, max_length=256)
	outputs = model(**inputs)
	predictions = torch.argmax(outputs.logits, dim=-1)
	print("Prediction:", "Phishing" if predictions.item() == 1 else "Safe")

	```

	## Performance
	The model achieves high accuracy across different chunks of training data, with performance metrics above 98% accuracy and an AUC close to or at 1.00 in later stages. This indicates robust and reliable phishing detection across varied datasets.

	## Limitations and Biases
	The model's performance may degrade on URLs containing obfuscated or novel phishing techniques.
	It may be less effective on non-English URLs and may need further fine-tuning for different languages or domain-specific URLs.

	### Contact and Support
	For questions, improvements, or support, please contact us through the Hugging Face community or open an issue in the model repository.