aashish1904 commited on
Commit
b06a85b
·
verified ·
1 Parent(s): 551ed30

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +170 -0
README.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
5
+ tags:
6
+ - instruct
7
+ - finetune
8
+ library_name: transformers
9
+ license: cc-by-sa-4.0
10
+ pipeline_tag: text-generation
11
+
12
+ ---
13
+
14
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
15
+
16
+
17
+ # QuantFactory/natural-sql-7b-GGUF
18
+ This is quantized version of [chatdb/natural-sql-7b](https://huggingface.co/chatdb/natural-sql-7b) created using llama.cpp
19
+
20
+ # Original Model Card
21
+
22
+
23
+ # **Natural-SQL-7B by ChatDB**
24
+ ## Natural-SQL-7B is a model with very strong performance in Text-to-SQL instructions, has an excellent understanding of complex questions, and outperforms models of the same size in its space.
25
+
26
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/648a374f00f7a3374ee64b99/hafdsfrFCqrVbATIzV_EN.png" width="600">
27
+
28
+ [ChatDB.ai](https://chatdb.ai) | [Notebook](https://github.com/cfahlgren1/natural-sql/blob/main/natural-sql-7b.ipynb) | [Twitter](https://twitter.com/calebfahlgren)
29
+
30
+ # **Benchmarks**
31
+ ### *Results on Novel Datasets not trained on via SQL-Eval*
32
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/648a374f00f7a3374ee64b99/5ynfoKPzI3_-WasQQt7qR.png" width="800">
33
+
34
+ <em>Big thanks to the [defog](https://huggingface.co/defog) team for open sourcing [sql-eval](https://github.com/defog-ai/sql-eval)</em>👏
35
+
36
+ Natural-SQL also can handle complex, compound questions that other models typically struggle with. There is a more detailed writeup Here is a write up, small test done [here](https://chatdb.ai/post/naturalsql-vs-sqlcoder-for-text-to-sql).
37
+ # Usage
38
+
39
+ Make sure you have the correct version of the transformers library installed:
40
+
41
+ ```sh
42
+ pip install transformers==4.35.2
43
+ ```
44
+
45
+ ### Loading the Model
46
+
47
+ Use the following Python code to load the model:
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
52
+ tokenizer = AutoTokenizer.from_pretrained("chatdb/natural-sql-7b")
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ "chatdb/natural-sql-7b",
55
+ device_map="auto",
56
+ torch_dtype=torch.float16,
57
+ )
58
+ ```
59
+
60
+ ### **License**
61
+
62
+ The model weights are licensed under `CC BY-SA 4.0`, with extra guidelines for responsible use expanded from the original model's [Deepseek](https://github.com/deepseek-ai/deepseek-coder/blob/main/LICENSE-MODEL) license.
63
+ You're free to use and adapt the model, even commercially.
64
+ If you alter the weights, such as through fine-tuning, you must publicly share your changes under the same `CC BY-SA 4.0` license.
65
+
66
+
67
+ ### Generating SQL
68
+
69
+ ```python
70
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
71
+ generated_ids = model.generate(
72
+ **inputs,
73
+ num_return_sequences=1,
74
+ eos_token_id=100001,
75
+ pad_token_id=100001,
76
+ max_new_tokens=400,
77
+ do_sample=False,
78
+ num_beams=1,
79
+ )
80
+
81
+ outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
82
+ print(outputs[0].split("```sql")[-1])
83
+ ```
84
+ # Prompt Template
85
+
86
+ ```
87
+ # Task
88
+ Generate a SQL query to answer the following question: `{natural language question}`
89
+
90
+ ### PostgreSQL Database Schema
91
+ The query will run on a database with the following schema:
92
+
93
+ <SQL Table DDL Statements>
94
+
95
+ # SQL
96
+ Here is the SQL query that answers the question: `{natural language question}`
97
+ '''sql
98
+ ```
99
+
100
+
101
+ # Example SQL Output
102
+
103
+ ### Example Schemas
104
+
105
+ ```sql
106
+ CREATE TABLE users (
107
+ user_id SERIAL PRIMARY KEY,
108
+ username VARCHAR(50) NOT NULL,
109
+ email VARCHAR(100) NOT NULL,
110
+ password_hash TEXT NOT NULL,
111
+ created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
112
+ );
113
+ CREATE TABLE projects (
114
+ project_id SERIAL PRIMARY KEY,
115
+ project_name VARCHAR(100) NOT NULL,
116
+ description TEXT,
117
+ start_date DATE,
118
+ end_date DATE,
119
+ owner_id INTEGER REFERENCES users(user_id)
120
+ );
121
+ CREATE TABLE tasks (
122
+ task_id SERIAL PRIMARY KEY,
123
+ task_name VARCHAR(100) NOT NULL,
124
+ description TEXT,
125
+ due_date DATE,
126
+ status VARCHAR(50),
127
+ project_id INTEGER REFERENCES projects(project_id)
128
+ );
129
+ CREATE TABLE taskassignments (
130
+ assignment_id SERIAL PRIMARY KEY,
131
+ task_id INTEGER REFERENCES tasks(task_id),
132
+ user_id INTEGER REFERENCES users(user_id),
133
+ assigned_date DATE NOT NULL DEFAULT CURRENT_TIMESTAMP
134
+ );
135
+ CREATE TABLE comments (
136
+ comment_id SERIAL PRIMARY KEY,
137
+ content TEXT NOT NULL,
138
+ created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
139
+ task_id INTEGER REFERENCES tasks(task_id),
140
+ user_id INTEGER REFERENCES users(user_id)
141
+ );
142
+ ```
143
+ ### Example SQL Outputs
144
+
145
+ **Question**: **Show me the day with the most users joining**
146
+ ```sql
147
+ SELECT created_at::DATE AS day, COUNT(*) AS user_count
148
+ FROM users
149
+ GROUP BY day
150
+ ORDER BY user_count DESC
151
+ LIMIT 1;
152
+ ```
153
+ **Question**: **Show me the project that has a task with the most comments**
154
+ ```sql
155
+ SELECT p.project_name, t.task_name, COUNT(c.comment_id) AS comment_count
156
+ FROM projects p
157
+ JOIN tasks t ON p.project_id = t.project_id
158
+ JOIN comments c ON t.task_id = c.task_id
159
+ GROUP BY p.project_name, t.task_name
160
+ ORDER BY comment_count DESC
161
+ LIMIT 1;
162
+ ```
163
+
164
+ **Question**: **What is the ratio of users with gmail addresses vs without?**
165
+ ```sql
166
+ SELECT
167
+ SUM(CASE WHEN email ILIKE '%@gmail.com%' THEN 1 ELSE 0 END)::FLOAT / NULLIF(SUM(CASE WHEN email NOT ILIKE '%@gmail.com%' THEN 1 ELSE 0 END), 0) AS gmail_ratio
168
+ FROM
169
+ users;
170
+ ```