|
--- |
|
title: Cyberattack Detection ML Approach |
|
emoji: 🛡️ |
|
colorFrom: red |
|
colorTo: pink |
|
sdk: streamlit |
|
sdk_version: 1.42.2 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
short_description: UNSW Dataset in an ML-based analysis |
|
--- |
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
# Cyber Attack Detection ML Approach |
|
|
|
This Streamlit app provides an interactive analysis of the UNSW-NB15 dataset, a popular benchmark for evaluating network intrusion detection systems. The app leverages machine learning techniques to classify network traffic as either normal or indicative of various attack types. |
|
|
|
## About the UNSW-NB15 Dataset |
|
|
|
The UNSW-NB15 dataset was created by the Cyber Security Lab at the University of New South Wales, Canberra. It's a comprehensive dataset containing network traffic captures (tcpdump) and system call traces. The dataset includes a variety of modern attack types, making it a valuable resource for training and testing intrusion detection systems. Key features of the dataset include: |
|
|
|
- **Diverse Attack Types:** Covers a wide range of attacks such as Fuzzers, Backdoor, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. |
|
- **Realistic Network Traffic:** Generated using a realistic network environment, simulating real-world scenarios. |
|
- **Labeled Data:** Each network flow is labeled with its corresponding attack type or as normal traffic, enabling supervised learning. |
|
|
|
## App Purpose |
|
|
|
This app aims to: |
|
|
|
1. **Visualize and Explore the Data:** Provide an interface to view the dataset's structure, data types, and descriptive statistics. This allows users to understand the characteristics of the UNSW-NB15 dataset. |
|
2. **Train and Evaluate Machine Learning Models:** Implement and compare the performance of several machine learning classifiers, specifically: |
|
- Naive Bayes |
|
- Decision Tree |
|
- K-Nearest Neighbors |
|
3. **Analyze Model Performance:** Present confusion matrices and classification reports to evaluate the effectiveness of each model in detecting different attack types. This helps users understand the strengths and weaknesses of each algorithm. |
|
4. **Facilitate Learning:** Serve as an educational tool for learning about network intrusion detection, machine learning classification, and dataset analysis. |
|
|
|
## Installation |
|
|
|
To run this app, you need to have Python installed along with the following libraries: |
|
|
|
- streamlit |
|
- datasets |
|
- pandas |
|
- huggingface_hub |
|
- scikit-learn |
|
- seaborn |
|
- matplotlib |
|
- numpy |
|
- Pillow |
|
|
|
You can install the required libraries using pip: |
|
|
|
```bash |
|
pip install streamlit datasets pandas huggingface_hub scikit-learn seaborn matplotlib numpy Pillow |
|
``` |
|
|
|
## Usage |
|
|
|
1. Ensure you have set the `HF_TOKEN` environment variable with your Hugging Face token. |
|
2. Run the Streamlit app: |
|
|
|
```bash |
|
streamlit run app.py |
|
``` |
|
|
|
## Features |
|
|
|
- **Dataset Information:** View the dataset's structure, data types, and descriptive statistics. |
|
- **Naive Bayes Classifier:** Train and evaluate a Naive Bayes model. |
|
- **Decision Tree Classifier:** Train and evaluate a Decision Tree model. |
|
- **K-Nearest Neighbor Classifier:** Train and evaluate a K-Nearest Neighbor model. |
|
- **Confusion Matrix and Classification Report:** Visualize the performance of each model. |
|
|
|
## Screenshots |
|
|
|
!Cybersecurity |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. |
|
|
|
## Acknowledgements |
|
|
|
- The UNSW-NB15 dataset creators at the University of New South Wales, Canberra. |
|
- The Hugging Face team for providing the datasets and tools. |