File size: 2,701 Bytes
ca4118e 3e7566e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
library_name: transformers
tags:
- text-to-SQL
- SQL
- code-generation
- NLQ-to-SQL
- text2SQL
- Security
- Vulnerability detection
datasets:
- salmane11/SQLShield
language:
- en
base_model:
- microsoft/codebert-base
---
# SQLQueryShield
## Model Description
SQLQueryShield is a vulnerable SQL query detection model. It classifies SQL queries as either vulnerable (e.g., prone to SQL injection or unsafe execution) or benign (safe to execute).
The checkpoint included in this repository is based on [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) and further finetuned on [SQLShield](https://huggingface.co/datasets/salmane11/SQLShield), a dataset dedicated to text-to-SQL vulnerability detection composed of vulnerable and safe NLQs and their related SQL queries.
## Finetuning Procedure
The model was fine-tuned using the Hugging Face Transformers library. The following steps were used:
1. Dataset: SSQLShield, only the SQL queries from the (NLQ, SQL) pairs were used for training.
2. Preprocessing:
- Input Format: Raw SQL query strings.
- Tokenization: Tokenized using microsoft/codebert-base.
- Max Length: 128 tokens.
- Padding and truncation applied.
## Intended Use and Limitations
SQLQueryShield is intended for use as a post-generation filter or analysis tool in any system that executes or generates SQL queries. Its main role is to detect whether a SQL query is potentially harmful due to vulnerability patterns such as SQL injection, improper string concatenation, or unsafe expressions.
Ideal use cases:
- Filtering SQL queries in Text-to-SQL applications
- Post-processing or validating user-generated SQL before execution
## How to Use
Example 1: Malicious
```python
from transformers import pipeline
sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")
# For the following Table schema
# CREATE TABLE campuses
# (
# campus VARCHAR,
# location VARCHAR
# )
query = "SELECT campus FROM campuses WHERE location = '' UNION SELECT database() --"
prediction = sql_query_shield(query)
print(prediction)
#[{'label': 'MALICIOUS', 'score': 0.9995294809341431}]
```
Example 2: Safe
```python
from transformers import pipeline
sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")
# For the following Table schema
# CREATE TABLE tv_channel
# (
# package_option VARCHAR,
# series_name VARCHAR
# )
query = "SELECT package_option FROM tv_channel WHERE series_name = 'Sky Radio'"
prediction = sql_query_shield(query)
print(prediction)
#[{'label': 'SAFE', 'score': 0.999503493309021}]
```
## Cite our work
Citation |