afrideva commited on
Commit
268e709
·
verified ·
1 Parent(s): 84c1784

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +206 -0
README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: PipableAI/pip-sql-1.3b
3
+ datasets:
4
+ - PipableAI/pip-txt-to-sql-spider-bird-dataset
5
+ inference: true
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ license: apache-2.0
10
+ metrics:
11
+ - accuracy
12
+ model_creator: PipableAI
13
+ model_name: pip-sql-1.3b
14
+ pipeline_tag: text-generation
15
+ quantized_by: afrideva
16
+ tags:
17
+ - sql
18
+ - code
19
+ - text2sql
20
+ - instruction_tuned
21
+ - basemodel
22
+ - jax
23
+ - pytorch
24
+ - text-generation-inference
25
+ - gguf
26
+ - ggml
27
+ - quantized
28
+ widget:
29
+ - example_title: example
30
+ text: '<schema>CREATE TABLE system(JobID: String,GID: String, UID: String, Start:Time(yyyy/mm/dd),
31
+ End: Time,ElapsedRaw: Time, CPUTimeRAW: Time,NCPUS: Number,NNodes: Number, NodeList:
32
+ List, State:String, Timelimit: Time);</schema><question>Get UID and job id for
33
+ Jobs that started on Jan 20 , 2023 ended on feb 14 2023 and has job id 20</question><sql>'
34
+ ---
35
+
36
+ # pip-sql-1.3b-GGUF
37
+
38
+ Quantized GGUF model files for [pip-sql-1.3b](https://huggingface.co/PipableAI/pip-sql-1.3b) from [PipableAI](https://huggingface.co/PipableAI)
39
+
40
+ ## Original Model Card:
41
+
42
+ # pipSQL-1.3b
43
+
44
+ [pipableAi](https://www.linkedin.com/company/pipable.ai/about/)
45
+
46
+ [colab_notebook](https://colab.research.google.com/drive/1insSxvc3jjAXe0zmdIjmbG3ttb5mpRgQ?usp=sharing)
47
+
48
+ ## What have we built?
49
+ A 1.3 bn SQL model that outperforms most SQL expert models and chatgpt on popular benchmarks.
50
+ This is a distilled model built on the deepseek base model.
51
+ Please refer to https://huggingface.co/PipableAI/pip-library-etl-1.3b for our state of the art model.
52
+ ## How we built it?
53
+
54
+ We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up.
55
+ Loss behaviour in the set up mentioned above -
56
+
57
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658d8095a2a6a6e0da8bb8a6/I80Ru1r4thoYrLagIWALa.png)
58
+
59
+ ## Benchmarking :
60
+ For benchmarking purposes we are using Semantic Evaluation for Text-to-SQL with
61
+ Distilled Test Suites, an officially accepted evaluation framework for Spider, SParC, and CoSQL which was proposed by a research team of Yale and Berkeley.
62
+ The benchmark contains 2200 test data points
63
+ Here is the link to run the evaluation:
64
+
65
+
66
+ [Test Suite SQL Eval](https://github.com/taoyds/test-suite-sql-eval)
67
+
68
+ |model|easy|medium|hard|extra|
69
+ |-----|----|------|----|-----|
70
+ |sqlcoder-7b-2|72.0|58.0|40.6|37.3|
71
+ |pipSQL-1.3b|78.5|57.5|42.1|28.3|
72
+ |pipSQL-7b|63.0|40.0|30.2|25.0|
73
+ |sqlcoder-7b|60.6|48.2|28.3|20.4|
74
+ |gpt-3.5|58.8|44.7|31.0|28.4|
75
+
76
+ We have also benchmarked it on defog eval.
77
+ It contains 200 test data points handpicked by defog team.
78
+ Here is the link to it:
79
+
80
+
81
+ [Defog SQL-Eval](https://github.com/defog-ai/sql-eval)
82
+ These are the results -
83
+
84
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64d32c6b921678fdc9de3302/fFeLSEYBNpQk_JWjFsF5M.png)
85
+
86
+ ## License
87
+ The model is open source under apache 2.0. License
88
+
89
+ ## Usage
90
+
91
+ ### Installation
92
+
93
+ ```bash
94
+ pip install transformers
95
+ ```
96
+
97
+ ### Prompt
98
+ ```python
99
+ prompt = f"""<schema>{schema}</schema>
100
+ <question>{question}</question>
101
+ <sql>"""
102
+ ```
103
+
104
+ ### PyTorch
105
+ ```python
106
+ from transformers import AutoModelForCausalLM, AutoTokenizer
107
+ device = "cuda"
108
+ model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-sql-1.3b")
109
+ tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-sql-1.3b")
110
+
111
+ inputs = tokenizer(text, return_tensors="pt")
112
+ outputs = model.generate(**inputs, max_new_tokens=200)
113
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0])
114
+ ```
115
+
116
+ ### Flax
117
+ ```python
118
+ from transformers import FlaxAutoModelForCausalLM, AutoTokenizer
119
+ device = "cuda"
120
+ model = FlaxAutoModelForCausalLM.from_pretrained("PipableAI/pip-sql-1.3b",from_pt=True)
121
+ tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-sql-1.3b")
122
+
123
+ inputs = tokenizer(text, return_tensors="jax")
124
+ outputs = model.generate(**inputs, max_new_tokens=200)
125
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0])
126
+ ```
127
+
128
+ ## Examples
129
+
130
+ ### Schema
131
+ ```sql
132
+ CREATE TABLE Products (
133
+ product_id number,
134
+ parent_product_id number,
135
+ product_name text,
136
+ product_price number,
137
+ product_color text,
138
+ product_size text,
139
+ product_description text);
140
+
141
+ CREATE TABLE Customers (
142
+ customer_id number,
143
+ gender_code text,
144
+ customer_first_name text,
145
+ customer_middle_initial text,
146
+ customer_last_name text,
147
+ email_address text,
148
+ login_name text,
149
+ login_password text,
150
+ phone_number text,
151
+ address_line_1 text,
152
+ town_city text,
153
+ county text,
154
+ country text);
155
+
156
+ CREATE TABLE Customer_Payment_Methods (
157
+ customer_id number,
158
+ payment_method_code text);
159
+
160
+ CREATE TABLE Invoices (
161
+ invoice_number number,
162
+ invoice_status_code text,
163
+ invoice_date time);
164
+
165
+ CREATE TABLE Orders (
166
+ order_id number,
167
+ customer_id number,
168
+ order_status_code text,
169
+ date_order_placed time);
170
+
171
+ CREATE TABLE Order_Items (
172
+ order_item_id number,
173
+ product_id number,
174
+ order_id number,
175
+ order_item_status_code text);
176
+
177
+ CREATE TABLE Shipments (
178
+ shipment_id number,
179
+ order_id number,
180
+ invoice_number number,
181
+ shipment_tracking_number text,
182
+ shipment_date time);
183
+
184
+ CREATE TABLE Shipment_Items (
185
+ shipment_id number,
186
+ order_item_id number);
187
+ ```
188
+
189
+ ### Questions
190
+ What are the email address, town and county of the customers who are of the least common gender?
191
+ ```sql
192
+ SELECT email_address , town_city , county FROM customers GROUP BY gender_code ORDER BY count(*) ASC LIMIT 1
193
+ ```
194
+
195
+ What are the product price and the product size of the products whose price is above average?
196
+ ```sql
197
+ SELECT product_price , product_size FROM products WHERE product_price > (SELECT avg(product_price) FROM products)
198
+ ```
199
+
200
+ Which customers did not make any orders? List the first name, middle initial and last name.
201
+ ```sql
202
+ SELECT T1.customer_first_name , T1.customer_middle_initial , T1.customer_last_name FROM Customers AS T1 WHERE T1.customer_id NOT IN (SELECT T2.customer_id FROM Orders AS T2)
203
+ ```
204
+
205
+ ### Team
206
+ Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya