Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.21.0
title: Spam Email Detection
emoji: π
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 3.17.0
app_file: app.py
Email Spam and Phishing URL Detection
This project utilizes Naive Bayes classification to detect whether an email is spam or not, and XGBoost classification to determine if a URL within an email is phishing or legitimate.
Getting Started
Project Overview
The project consists of two main components:
Email Spam Detection: This component employs Naive Bayes classification to classify emails as either spam or not spam based on their content features.
Phishing URL Detection: This component uses XGBoost classification to identify whether URLs within emails are associated with phishing attempts or legitimate websites.
Prerequisites
Make sure you have Python 3.10 installed on your system. You can download it from
Requirements
Ensure you have the following dependencies installed. You can install them using pip install -r requirements.txt
.
- gunicorn==22.0.0
- python-dateutil==2.8.2
- gradio==4.32.1
- gradio_client==0.17.0
- requests==2.31.0
- beautifulsoup4==4.12.3
- googlesearch_python==1.2.4
- urlextract==1.9.0
- numpy==1.26.3
- pandas==2.2.0
- scikit-learn==1.5.0
- urllib3==2.1.0
- python-whois==0.9.4
- xgboost==2.0.3
- lxml==5.2.2
Setup and Installation
Clone the repository:
git clone https://github.com/your-username/email-spam-phishing-detection.git cd email-spam-phishing-detection
Install dependencies:
pip install -r requirements.txt```
Usage
Data Preparation:
- Ensure the datasets
spam.csv
andurldata.csv
are available in thedata/
directory.
- Ensure the datasets
Model Training:
- If necessary, modify and run the
notebook.ipynb
Jupyter notebook to train or fine-tune the machine learning models. - Trained models will be saved in the
models/
directory.
- If necessary, modify and run the
Run the Application:
- Execute
app.py
to start the application. - Access the application at Hugging Face Space
- Execute
Acknowledgements
- The email spam classification model is trained using the
spam.csv
dataset, sourced from Dataset: Spam/ham mail). - The URL phishing detection model is trained using the
urldata.csv
dataset, sourced from Phishing Websites Dataset.
License
This project is licensed under the MIT License - see the LICENSE file for details.