---
license: mit
language:
- en
auto_detected: true
datasets:
- Canstralian/pentesting_dataset
- Canstralian/Wordlists
- Canstralian/ShellCommands
- Canstralian/CyberExploitDB
- Chemically-motivated/CyberSecurityDataset
- Chemically-motivated/AI-Agent-Generating-Tool-Debugging-Prompt-Library
metrics:
- accuracy
- precision
- f1
- code_eval
base_model:
- WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5
library_name: transformers
tags:
- code
---

# CyberAttackDetection

## Overview

The **CyberAttackDetection** model is a fine-tuned BERT-based sequence classification model designed to identify cyberattacks in textual descriptions. It classifies input data into two categories:  
- **Attack (1)**: The text describes a cybersecurity threat or attack.  
- **Non-Attack (0)**: The text does not describe a cybersecurity threat.

---

## Model Details

- **License**: [MIT License](LICENSE)
- **Datasets**:  
  - Custom cybersecurity datasets:  
    - `Canstralian/pentesting_dataset`  
    - `Canstralian/Wordlists`  
    - `Canstralian/ShellCommands`  
    - `Canstralian/CyberExploitDB`  
    - `Chemically-motivated/CyberSecurityDataset`  
    - `Chemically-motivated/AI-Agent-Generating-Tool-Debugging-Prompt-Library`  
- **Language**: English  
- **Metrics**:  
  - **Accuracy**: 85%  
  - **F1 Score**: 0.83  
  - **Precision**: 0.80  
  - **Recall**: 0.87  
- **Base Model**: `WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5`  
- **Pipeline Tag**: `text-classification`  
- **Library Name**: `transformers`  
- **Tags**: `cybersecurity`, `text-classification`, `attack-detection`, `BERT`  
- **New Version**: `v1.0.0`  
- **Auto-Detected Features**: True  

---

## Model Usage

### Installation
Before using the model, ensure the necessary dependencies are installed:  
```bash
pip install transformers torch
```

### Example Code
Use the following Python code to load the model and classify a sample text:

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load the fine-tuned model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Canstralian/CyberAttackDetection")
tokenizer = AutoTokenizer.from_pretrained("Canstralian/CyberAttackDetection")

# Example input: Cyberattack description
text = "A vulnerability was discovered in the server software."

# Tokenize the input
inputs = tokenizer(text, return_tensors="pt")

# Get model predictions
outputs = model(**inputs)

# Predict the label (1 = attack, 0 = non-attack)
prediction = outputs.logits.argmax(dim=-1)
print(f"Prediction: {'Attack' if prediction.item() == 1 else 'Non-Attack'}")
```

## Prompts:
- Open Ports: "Analyze the following network scan report and identify open ports and their associated vulnerabilities. Suggest best practices to secure these ports: [Insert network scan report]."
- Outdated Software or Services: "Given this list of installed software and services, identify outdated versions and known vulnerabilities. Provide recommendations for updates or patches to mitigate risks: [Insert software and service list]."
- Default Credentials: "Scan the following system configurations for any use of default credentials. Provide a list of affected services and recommendations for securing these credentials: [Insert system configuration details]."
- Misconfigurations: "Evaluate the provided system configuration for potential misconfigurations. Highlight risks and provide recommendations for secure setup: [Insert system configuration details]."
- Injection Flaws: "Review the given web application code or request logs and identify potential injection vulnerabilities such as SQL injection, command injection, or XSS. Provide remediation steps: [Insert code or logs]."
- Unencrypted Services: "Analyze the following network configuration and identify services that are transmitting data without encryption. Suggest strategies to enforce secure transmission: [Insert network configuration details]."
- Known Software Vulnerabilities: "Review the provided software inventory and cross-reference it with known vulnerabilities in the National Vulnerability Database (NVD). Recommend patches or workarounds: [Insert software inventory]."
- Cross-Site Request Forgery (CSRF): "Examine the provided web application code for potential CSRF vulnerabilities. Suggest specific coding or configuration techniques to prevent these attacks: [Insert code]."
- Insecure Direct Object References (IDOR): "Analyze the provided API endpoints and their associated access controls. Identify any IDOR vulnerabilities and suggest secure implementation strategies: [Insert API endpoint details]."
- Security Misconfigurations in Web Servers/Applications: "Assess the given web server configuration for security misconfigurations, such as improper HTTP headers or verbose error messages. Recommend changes to harden the server: [Insert server configuration]."
- Broken Authentication and Session Management: "Review the provided authentication and session management implementation. Identify weaknesses and recommend strategies to prevent compromise: [Insert authentication/session management details]."
- Sensitive Data Exposure: "Analyze the system's data handling processes and storage practices to identify potential sensitive data exposure. Recommend measures to protect sensitive information: [Insert system details]."
- API Vulnerabilities: "Examine the following API documentation and implementation for vulnerabilities, including insecure endpoints and data leakage. Provide recommendations for securing the API: [Insert API documentation]."
- Denial of Service (DoS) Vulnerabilities: "Review the system's architecture and configuration for potential vulnerabilities to DoS attacks. Suggest mitigation strategies such as rate limiting and load balancing: [Insert system architecture]."
- Buffer Overflows: "Analyze the provided code or application for buffer overflow vulnerabilities. Highlight potential weak points and recommend secure coding practices to prevent exploitation: [Insert code]."


## Model Training Details

### Training Objective
The model was fine-tuned to classify descriptive text as either an attack or non-attack event. It uses a **binary classification** approach.

### Training Data
- The training data includes cybersecurity-related attack descriptions and non-attack examples from curated datasets.

---

## Evaluation

The model was evaluated on a balanced test set using the following metrics:  
- **Accuracy**: 85%  
- **F1 Score**: 0.83  
- **Precision**: 0.80  
- **Recall**: 0.87  

These results indicate strong performance in detecting cyberattacks from text.

---

## License

This project is licensed under the **MIT License**. Refer to the [LICENSE](LICENSE) file for details.

---

## How to Contribute

We welcome contributions!  
- **Submit Issues**: If you encounter problems, open an issue on the repository.  
- **Pull Requests**: Feel free to contribute code improvements or documentation updates.

---

## Contact

For further information or inquiries, contact: **canstralian@cybersecurity.com**