salmane11 commited on
Commit
3e7566e
·
verified ·
1 Parent(s): a265e9a

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -0
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - text-to-SQL
5
+ - SQL
6
+ - code-generation
7
+ - NLQ-to-SQL
8
+ - text2SQL
9
+ - Security
10
+ - Vulnerability detection
11
+ datasets:
12
+ - salmane11/SQLShield
13
+ language:
14
+ - en
15
+ base_model:
16
+ - microsoft/codebert-base
17
+ ---
18
+
19
+ # SQLQueryShield
20
+
21
+ ## Model Description
22
+
23
+ SQLQueryShield is a vulnerable SQL query detection model. It classifies SQL queries as either vulnerable (e.g., prone to SQL injection or unsafe execution) or benign (safe to execute).
24
+
25
+ The checkpoint included in this repository is based on [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) and further finetuned on [SQLShield](https://huggingface.co/datasets/salmane11/SQLShield), a dataset dedicated to text-to-SQL vulnerability detection composed of vulnerable and safe NLQs and their related SQL queries.
26
+
27
+
28
+ ## Finetuning Procedure
29
+ The model was fine-tuned using the Hugging Face Transformers library. The following steps were used:
30
+
31
+ 1. Dataset: SSQLShield, only the SQL queries from the (NLQ, SQL) pairs were used for training.
32
+
33
+ 2. Preprocessing:
34
+
35
+ - Input Format: Raw SQL query strings.
36
+
37
+ - Tokenization: Tokenized using microsoft/codebert-base.
38
+
39
+ - Max Length: 128 tokens.
40
+
41
+ - Padding and truncation applied.
42
+
43
+ ## Intended Use and Limitations
44
+
45
+ SQLQueryShield is intended for use as a post-generation filter or analysis tool in any system that executes or generates SQL queries. Its main role is to detect whether a SQL query is potentially harmful due to vulnerability patterns such as SQL injection, improper string concatenation, or unsafe expressions.
46
+
47
+ Ideal use cases:
48
+
49
+ - Filtering SQL queries in Text-to-SQL applications
50
+
51
+ - Post-processing or validating user-generated SQL before execution
52
+
53
+
54
+ ## How to Use
55
+
56
+ Example 1: Malicious
57
+
58
+ ```python
59
+ from transformers import pipeline
60
+
61
+ sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")
62
+
63
+ # For the following Table schema
64
+ # CREATE TABLE campuses
65
+ # (
66
+ # campus VARCHAR,
67
+ # location VARCHAR
68
+ # )
69
+
70
+ query = "SELECT campus FROM campuses WHERE location = '' UNION SELECT database() --"
71
+
72
+ prediction = sql_query_shield(query)
73
+ print(prediction)
74
+ #{label:"MALICIOUS", probaility:0.9}
75
+ ```
76
+
77
+
78
+ Example 2: Safe
79
+
80
+ ```python
81
+ from transformers import pipeline
82
+
83
+ sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")
84
+
85
+ # For the following Table schema
86
+ # CREATE TABLE tv_channel
87
+ # (
88
+ # package_option VARCHAR,
89
+ # series_name VARCHAR
90
+ # )
91
+
92
+ query = "SELECT package_option FROM tv_channel WHERE series_name = 'Sky Radio'"
93
+
94
+
95
+ prediction = sql_query_shield(query)
96
+ print(prediction)
97
+ #{label:"SAFE", probaility:0.99}
98
+ ```
99
+
100
+
101
+ ## Cite our work
102
+
103
+ Citation