salmane11 commited on
Commit
ca4118e
·
verified ·
1 Parent(s): 3e7566e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -102
README.md CHANGED
@@ -1,103 +1,103 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - text-to-SQL
5
- - SQL
6
- - code-generation
7
- - NLQ-to-SQL
8
- - text2SQL
9
- - Security
10
- - Vulnerability detection
11
- datasets:
12
- - salmane11/SQLShield
13
- language:
14
- - en
15
- base_model:
16
- - microsoft/codebert-base
17
- ---
18
-
19
- # SQLQueryShield
20
-
21
- ## Model Description
22
-
23
- SQLQueryShield is a vulnerable SQL query detection model. It classifies SQL queries as either vulnerable (e.g., prone to SQL injection or unsafe execution) or benign (safe to execute).
24
-
25
- The checkpoint included in this repository is based on [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) and further finetuned on [SQLShield](https://huggingface.co/datasets/salmane11/SQLShield), a dataset dedicated to text-to-SQL vulnerability detection composed of vulnerable and safe NLQs and their related SQL queries.
26
-
27
-
28
- ## Finetuning Procedure
29
- The model was fine-tuned using the Hugging Face Transformers library. The following steps were used:
30
-
31
- 1. Dataset: SSQLShield, only the SQL queries from the (NLQ, SQL) pairs were used for training.
32
-
33
- 2. Preprocessing:
34
-
35
- - Input Format: Raw SQL query strings.
36
-
37
- - Tokenization: Tokenized using microsoft/codebert-base.
38
-
39
- - Max Length: 128 tokens.
40
-
41
- - Padding and truncation applied.
42
-
43
- ## Intended Use and Limitations
44
-
45
- SQLQueryShield is intended for use as a post-generation filter or analysis tool in any system that executes or generates SQL queries. Its main role is to detect whether a SQL query is potentially harmful due to vulnerability patterns such as SQL injection, improper string concatenation, or unsafe expressions.
46
-
47
- Ideal use cases:
48
-
49
- - Filtering SQL queries in Text-to-SQL applications
50
-
51
- - Post-processing or validating user-generated SQL before execution
52
-
53
-
54
- ## How to Use
55
-
56
- Example 1: Malicious
57
-
58
- ```python
59
- from transformers import pipeline
60
-
61
- sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")
62
-
63
- # For the following Table schema
64
- # CREATE TABLE campuses
65
- # (
66
- # campus VARCHAR,
67
- # location VARCHAR
68
- # )
69
-
70
- query = "SELECT campus FROM campuses WHERE location = '' UNION SELECT database() --"
71
-
72
- prediction = sql_query_shield(query)
73
- print(prediction)
74
- #{label:"MALICIOUS", probaility:0.9}
75
- ```
76
-
77
-
78
- Example 2: Safe
79
-
80
- ```python
81
- from transformers import pipeline
82
-
83
- sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")
84
-
85
- # For the following Table schema
86
- # CREATE TABLE tv_channel
87
- # (
88
- # package_option VARCHAR,
89
- # series_name VARCHAR
90
- # )
91
-
92
- query = "SELECT package_option FROM tv_channel WHERE series_name = 'Sky Radio'"
93
-
94
-
95
- prediction = sql_query_shield(query)
96
- print(prediction)
97
- #{label:"SAFE", probaility:0.99}
98
- ```
99
-
100
-
101
- ## Cite our work
102
-
103
  Citation
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - text-to-SQL
5
+ - SQL
6
+ - code-generation
7
+ - NLQ-to-SQL
8
+ - text2SQL
9
+ - Security
10
+ - Vulnerability detection
11
+ datasets:
12
+ - salmane11/SQLShield
13
+ language:
14
+ - en
15
+ base_model:
16
+ - microsoft/codebert-base
17
+ ---
18
+
19
+ # SQLQueryShield
20
+
21
+ ## Model Description
22
+
23
+ SQLQueryShield is a vulnerable SQL query detection model. It classifies SQL queries as either vulnerable (e.g., prone to SQL injection or unsafe execution) or benign (safe to execute).
24
+
25
+ The checkpoint included in this repository is based on [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) and further finetuned on [SQLShield](https://huggingface.co/datasets/salmane11/SQLShield), a dataset dedicated to text-to-SQL vulnerability detection composed of vulnerable and safe NLQs and their related SQL queries.
26
+
27
+
28
+ ## Finetuning Procedure
29
+ The model was fine-tuned using the Hugging Face Transformers library. The following steps were used:
30
+
31
+ 1. Dataset: SSQLShield, only the SQL queries from the (NLQ, SQL) pairs were used for training.
32
+
33
+ 2. Preprocessing:
34
+
35
+ - Input Format: Raw SQL query strings.
36
+
37
+ - Tokenization: Tokenized using microsoft/codebert-base.
38
+
39
+ - Max Length: 128 tokens.
40
+
41
+ - Padding and truncation applied.
42
+
43
+ ## Intended Use and Limitations
44
+
45
+ SQLQueryShield is intended for use as a post-generation filter or analysis tool in any system that executes or generates SQL queries. Its main role is to detect whether a SQL query is potentially harmful due to vulnerability patterns such as SQL injection, improper string concatenation, or unsafe expressions.
46
+
47
+ Ideal use cases:
48
+
49
+ - Filtering SQL queries in Text-to-SQL applications
50
+
51
+ - Post-processing or validating user-generated SQL before execution
52
+
53
+
54
+ ## How to Use
55
+
56
+ Example 1: Malicious
57
+
58
+ ```python
59
+ from transformers import pipeline
60
+
61
+ sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")
62
+
63
+ # For the following Table schema
64
+ # CREATE TABLE campuses
65
+ # (
66
+ # campus VARCHAR,
67
+ # location VARCHAR
68
+ # )
69
+
70
+ query = "SELECT campus FROM campuses WHERE location = '' UNION SELECT database() --"
71
+
72
+ prediction = sql_query_shield(query)
73
+ print(prediction)
74
+ #[{'label': 'MALICIOUS', 'score': 0.9995294809341431}]
75
+ ```
76
+
77
+
78
+ Example 2: Safe
79
+
80
+ ```python
81
+ from transformers import pipeline
82
+
83
+ sql_query_shield = pipeline("text-classification", model="salmane11/SQLQueryShield")
84
+
85
+ # For the following Table schema
86
+ # CREATE TABLE tv_channel
87
+ # (
88
+ # package_option VARCHAR,
89
+ # series_name VARCHAR
90
+ # )
91
+
92
+ query = "SELECT package_option FROM tv_channel WHERE series_name = 'Sky Radio'"
93
+
94
+
95
+ prediction = sql_query_shield(query)
96
+ print(prediction)
97
+ #[{'label': 'SAFE', 'score': 0.999503493309021}]
98
+ ```
99
+
100
+
101
+ ## Cite our work
102
+
103
  Citation