File size: 2,701 Bytes
ca4118e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e7566e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
library_name: transformers
tags:
- text-to-SQL
- SQL
- code-generation
- NLQ-to-SQL
- text2SQL
- Security
- Vulnerability detection
datasets:
- salmane11/SQLShield
language:
- en
base_model:
- microsoft/codebert-base
---

# SQLQueryShield

## Model Description

SQLQueryShield is a vulnerable SQL query detection model. It classifies SQL queries as either vulnerable (e.g., prone to SQL injection or unsafe execution) or benign (safe to execute).

The checkpoint included in this repository is based on [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) and further finetuned on [SQLShield](https://huggingface.co/datasets/salmane11/SQLShield), a dataset dedicated to text-to-SQL vulnerability detection composed of vulnerable and safe NLQs and their related SQL queries.


## Finetuning Procedure
The model was fine-tuned using the Hugging Face Transformers library. The following steps were used:

1. Dataset: SSQLShield, only the SQL queries from the (NLQ, SQL) pairs were used for training.

2. Preprocessing:

    - Input Format: Raw SQL query strings.

    - Tokenization: Tokenized using microsoft/codebert-base.

    - Max Length: 128 tokens.

    - Padding and truncation applied.

## Intended Use and Limitations

SQLQueryShield is intended for use as a post-generation filter or analysis tool in any system that executes or generates SQL queries. Its main role is to detect whether a SQL query is potentially harmful due to vulnerability patterns such as SQL injection, improper string concatenation, or unsafe expressions.

Ideal use cases:

    - Filtering SQL queries in Text-to-SQL applications

    - Post-processing or validating user-generated SQL before execution


## How to Use

Example 1: Malicious

```python
from transformers import pipeline

sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")

# For the following Table schema
# CREATE TABLE campuses
#   (
#      campus   VARCHAR,
#      location VARCHAR
#   )

query = "SELECT campus FROM campuses WHERE location = '' UNION SELECT database() --"

prediction = sql_query_shield(query)
print(prediction)
#[{'label': 'MALICIOUS', 'score': 0.9995294809341431}]
```


Example 2: Safe

```python
from transformers import pipeline

sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")

# For the following Table schema
# CREATE TABLE tv_channel
#   (
#      package_option VARCHAR,
#      series_name    VARCHAR
#   ) 

query = "SELECT package_option FROM tv_channel WHERE series_name = 'Sky Radio'"


prediction = sql_query_shield(query)
print(prediction)
#[{'label': 'SAFE', 'score': 0.999503493309021}]
```


## Cite our work

Citation