Yuk050 commited on
Commit
36d6420
·
verified ·
1 Parent(s): ec89761

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -3
README.md CHANGED
@@ -1,12 +1,124 @@
1
  ---
2
- license: gemma
3
  datasets:
4
  - gretelai/synthetic_text_to_sql
5
- language:
6
  - en
7
  base_model:
8
  - google/gemma-3-1b-it
9
  pipeline_tag: text2text-generation
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Test
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pretty_name: Gemma 3 1B Text-to-SQL Finetuned
3
  datasets:
4
  - gretelai/synthetic_text_to_sql
5
+ language:
6
  - en
7
  base_model:
8
  - google/gemma-3-1b-it
9
  pipeline_tag: text2text-generation
10
+ tags:
11
+ - text-to-sql
12
+ - qlora
13
+ - finetuned
14
+ - gemma
15
+ - llm
16
+ - sql
17
+ license: gemma
18
  ---
19
 
20
+ # Model Card: Gemma 3 1B Text-to-SQL Finetuned
21
+
22
+ ## Model Description
23
+
24
+ This model is a finetuned version of the `google/gemma-3-1b` large language model, specifically adapted for the text-to-SQL task. It leverages Quantized Low-Rank Adaptation (QLoRA) for efficient finetuning, making it suitable for deployment and inference on systems with limited computational resources.
25
+
26
+ The primary function of this model is to translate natural language questions and provided database schemas into executable SQL queries. This capability is crucial for applications requiring natural language interaction with databases, such as business intelligence tools, data analysis platforms, and conversational AI agents.
27
+
28
+ ## Intended Use
29
+
30
+ This model is intended for research and development purposes related to text-to-SQL generation. It can be used to:
31
+
32
+ * Generate SQL queries from natural language prompts and database schemas.
33
+ * Serve as a component in larger systems that require natural language interaction with databases.
34
+ * Further research into efficient finetuning techniques for large language models.
35
+
36
+ ### Out-of-Scope Use Cases
37
+
38
+ This model is not intended for:
39
+
40
+ * Generating SQL queries for highly sensitive or mission-critical systems without thorough validation and human oversight.
41
+ * Deployment in production environments without rigorous testing and adherence to security best practices.
42
+ * Generating SQL for databases with unknown or complex schemas without proper adaptation and training.
43
+
44
+ ## Training Data
45
+
46
+ The model was finetuned on the `gretelai/synthetic_text_to_sql` dataset [1]. This dataset is a high-quality, synthetically generated collection of text-to-SQL samples. Key characteristics of the dataset include:
47
+
48
+ * **Size**: 105,851 records (100,000 for training, 5,851 for testing).
49
+ * **Content**: Each record includes a natural language prompt (`sql_prompt`), database schema (`sql_context` as `CREATE TABLE` statements), the corresponding SQL query (`sql`), and an explanation of the SQL query (`sql_explanation`).
50
+ * **Diversity**: Covers 100 distinct domains/verticals and a wide range of SQL complexity levels (e.g., aggregations, joins, subqueries, window functions).
51
+
52
+ The training data was transformed into a conversational format, where the user provides the database schema and natural language query, and the assistant responds with the SQL query. An example of the input format is:
53
+
54
+ ```
55
+ Given the following database schema:
56
+
57
+ CREATE TABLE Employees (id INT, name VARCHAR(255), salary INT);
58
+
59
+ Generate the SQL query for: Select all employees with salary greater than 50000
60
+ ```
61
+
62
+ ## Training Procedure
63
+
64
+ The model was finetuned using the QLoRA technique, implemented with the `unsloth` library. The training was performed using the `SFTTrainer` from the `trl` library.
65
+
66
+ **Base Model**: `google/gemma-3-1b`
67
+
68
+ **Finetuning Parameters (QLoRA)**:
69
+
70
+ * **LoRA Rank (`r`)**: 16
71
+ * **LoRA Alpha (`lora_alpha`)**: 16
72
+ * **LoRA Dropout (`lora_dropout`)**: 0.05
73
+ * **Bias**: `none`
74
+ * **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
75
+ * **Gradient Checkpointing**: Enabled (`unsloth` optimized)
76
+
77
+ **Training Arguments (`SFTTrainer`)**:
78
+
79
+ * **Per Device Train Batch Size**: 2
80
+ * **Gradient Accumulation Steps**: 4
81
+ * **Warmup Steps**: 5
82
+ * **Max Steps**: 100 (can be adjusted for full dataset training)
83
+ * **Learning Rate**: 2e-4
84
+ * **Optimizer**: `adamw_8bit`
85
+ * **Precision**: `bf16` (if supported by GPU), otherwise `fp16`
86
+
87
+ ## Model Architecture
88
+
89
+ The Gemma 3 1B model is a decoder-only transformer architecture. During QLoRA finetuning, low-rank adapters are injected into the specified layers, allowing for efficient training by only updating a small fraction of the model's parameters while keeping the majority of the pre-trained weights frozen in 4-bit quantized form.
90
+
91
+ ## Performance and Limitations
92
+
93
+ Due to the synthetic nature of the training data, the model's performance on real-world, noisy, or highly complex database schemas may vary. It is recommended to perform further evaluation and potentially finetune on domain-specific data for production use cases.
94
+
95
+ **Limitations include:**
96
+
97
+ * **Schema Complexity**: May struggle with highly intricate database schemas or those with ambiguous column names.
98
+ * **Natural Language Ambiguity**: Performance can be affected by ambiguous or underspecified natural language queries.
99
+ * **SQL Dialect**: Primarily trained on standard SQL syntax. May require further adaptation for specific SQL dialects (e.g., PostgreSQL, MySQL, SQL Server).
100
+
101
+ ## Environmental Impact
102
+
103
+ Finetuning with QLoRA significantly reduces the computational resources and energy consumption compared to full finetuning. The specific energy consumption for this finetuning run would depend on the hardware used and the duration of training.
104
+
105
+ ## Citation
106
+
107
+ If you use this model or the finetuning approach, please consider citing the original Gemma model and the `gretelai/synthetic_text_to_sql` dataset:
108
+
109
+ ```bibtex
110
+ @article{gemma2024,
111
+ author = {Google},
112
+ title = {Gemma: A Family of Lightweight, State-of-the-Art Open Models},
113
+ year = {2024},
114
+ url = {https://ai.google.dev/gemma}
115
+ }
116
+
117
+ @software{gretel-synthetic-text-to-sql-2024,
118
+ author = {Meyer, Yev and Emadi, Marjan and Nathawani, Dhruv and Ramaswamy, Lipika and Boyd, Kendrick and Van Segbroeck, Maarten and Grossman, Matthew and Mlocek, Piotr and Newberry, Drew},
119
+ title = {{Synthetic-Text-To-SQL}: A synthetic dataset for training language models to generate SQL queries from natural language prompts},
120
+ month = {April},
121
+ year = {2024},
122
+ url = {https://huggingface.co/datasets/gretelai/synthetic-text-to-sql}
123
+ }
124
+ ```