aarohanverma
/

text2sql-flan-t5-base-qlora-finetuned

+---
+license: apache-2.0
+datasets:
+- Clinton/Text-to-sql-v1
+- b-mc2/sql-create-context
+- gretelai/synthetic_text_to_sql
+- knowrohit07/know_sql
+metrics:
+- rouge
+- bleu
+- fuzzy_match
+- exact_match
+base_model:
+- google/flan-t5-base
+pipeline_tag: text2text-generation
+library_name: transformers
+language:
+- en
+tags:
+- text2sql
+- transformers
+- flan-t5
+- seq2seq
+- qlora
+- peft
+- fine-tuning
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+This model is a fine-tuned version of [Flan-T5 Base](https://huggingface.co/google/flan-t5-base) optimized to convert natural language queries into SQL statements. It leverages **QLoRA (Quantized Low-Rank Adaptation)** with PEFT for efficient adaptation and has been trained on a concatenation of several high-quality text-to-SQL datasets. A live demo is available, and users can clone and run inference directly from Hugging Face.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This model is designed to generate SQL queries based on a provided natural language context and query.
+It has been fine-tuned using QLoRA with 4-bit quantization and PEFT on a diverse text-to-SQL dataset.
+The model demonstrates significant improvements over the original base model, making it highly suitable for practical text-to-SQL applications.
+- **Developed by:** Aarohan Verma
+- **Model type:** Seq2Seq / Text-to-Text Generation (SQL Generation)
+- **Language(s) (NLP):** English
+- **License:** Apache-2.0
+- **Finetuned from model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)
+### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Repository:** [https://huggingface.co/aarohanverma/text2sql-flan-t5-base-qlora-finetuned](https://huggingface.co/aarohanverma/text2sql-flan-t5-base-qlora-finetuned)
+- **Demo:** [Gradio Demo](https://huggingface.co/spaces/aarohanverma/text2sql-demo)
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+This model can be used directly for generating SQL queries from natural language inputs.
+It is particularly useful for applications in database querying and natural language interfaces for relational databases.
+### Downstream Use
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+The model can be further integrated into applications such as chatbots, data analytics platforms, and business intelligence tools to automate query generation.
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+This model is not designed for tasks outside text-to-SQL generation.
+It may not perform well for non-SQL language generation or queries outside the domain of structured data retrieval.
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+- **Bias:** The model's performance is influenced by the quality and diversity of the training data. It may underperform on SQL queries that deviate significantly from the training examples.
+- **Risks:** Inaccurate SQL generation may lead to unexpected query behavior, especially in safety-critical environments.
+- **Limitations:** The model may not generalize to complex SQL tasks that require deep domain knowledge beyond the training data.
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users should validate the generated SQL queries before deployment in production systems.
+Consider incorporating human-in-the-loop review for critical applications.
+## How to Get Started with the Model
+To get started, clone the repository or download the model from Hugging Face, then use the provided example code to run inference.
+Detailed instructions and the live demo are available in this model card.
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The model was fine-tuned on a concatenation of several publicly available text-to-SQL datasets:
+1. **[Clinton/Text-to-SQL v1](https://huggingface.co/datasets/Clinton/Text-to-sql-v1)**
+2. **[b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)**
+3. **[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)**
+4. **[knowrohit07/know_sql](https://huggingface.co/datasets/knowrohit07/know_sql)**
+**Data Split:**
+- **Training:** 85%
+- **Validation:** 5%
+- **Testing:** 10%
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing
+The raw data was preprocessed as follows:
+- **Cleaning:** Removal of extra whitespaces/newlines and standardization of columns (renaming to `query`, `context`, and `response`).
+- **Filtering:** Dropping examples with missing values and duplicates; retaining only rows where the prompt is ≤ 500 tokens and the response is ≤ 250 tokens.
+- **Tokenization:**
+Prompts are constructed in the format:
+```
+Context:
+{context}
+Query:
+{query}
+Response:
+```
+and tokenized with a maximum length of 512 for inputs and 256 for responses using [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)'s tokenizer.
+#### Training Hyperparameters
+- **Epochs:** 6
+- **Batch Sizes:**
+  Training: 64 per device
+  Evaluation: 64 per device
+- **Gradient Accumulation:** 2 steps
+- **Learning Rate:** 2e-4
+- **Optimizer:** `adamw_bnb_8bit` (memory-efficient variant of AdamW)
+- **LR Scheduler:** Cosine scheduler with a warmup ratio of 10%
+- **Quantization:** 4-bit NF4 (with double quantization) using `torch.bfloat16`
+- **LoRA Parameters:**
+- **Rank (r):** 32
+- **Alpha:** 64
+- **Dropout:** 0.1
+- **Target Modules:** `["q", "v"]`
+- **Checkpointing:**
+  Model saved at the end of every epoch
+  Early stopping with a patience of 2 epochs based on evaluation loss
+- **Reproducibility:** Random seeds are set across Python, NumPy, and PyTorch (seed = 42)
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+Evaluation metrics used:
+- **ROUGE:** Measures n-gram overlap between generated and reference SQL.
+- **BLEU:** Assesses the quality of translation from natural language to SQL.
+- **Fuzzy Match Score:** Uses token-set similarity to provide a soft match percentage.
+- **Exact Match Accuracy:** Percentage of queries that exactly match the reference SQL.
+### Results
+The table below summarizes the evaluation metrics comparing the original base model with the fine-tuned model:
+| **Metric**                | **Original Model**            | **Fine-Tuned Model**                                                                                        | **Improvement Commentary**                                              |
+|---------------------------|-------------------------------|-------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
+| **ROUGE-1**               | 0.03369                       | **0.69143**                                                                                                 | Over 20× increase; indicates much better content capture.              |
+| **ROUGE-2**               | 0.00817                       | **0.54533**                                                                                                 | Nearly 67× improvement; higher n-gram quality.                          |
+| **ROUGE-L**               | 0.03056                       | **0.66429**                                                                                                 | More than 21× increase; improved sequence similarity.                  |
+| **BLEU Score**            | 0.00367                       | **0.31698**                                                                                                 | Approximately 86× increase; demonstrates significant fluency gains.      |
+| **Fuzzy Match Score**     | 11.31%                        | **81.98%**                                                                                                  | Substantial improvement; generated SQL aligns much closer with human responses. |
+| **Exact Match Accuracy**  | 0.00%                         | **16.39%**                                                                                                  | Non-zero accuracy achieved; critical for production-readiness.          |
+#### Summary
+The fine-tuned model shows dramatic improvements across all evaluation metrics, proving its effectiveness in generating accurate and relevant SQL queries from natural language inputs.
+## 🔍 Inference & Example Usage
+### Inference Code
+Below is the recommended Python code for running inference on the fine-tuned model:
+```python
+import torch
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+import logging
+# Set up logging
+logging.basicConfig(
+  level=logging.INFO,
+  format="%(asctime)s - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+# Set device (GPU if available)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Load the fine-tuned model and tokenizer
+model_name = "aarohanverma/text2sql-flan-t5-base-qlora-finetuned"
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
+tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
+def run_inference(prompt_text: str) -> str:
+  """
+  Runs inference using deterministic decoding with beam search.
+  """
+  inputs = tokenizer(prompt_text, return_tensors="pt").to(device)
+  generated_ids = model.generate(
+      input_ids=inputs["input_ids"],
+      max_new_tokens=250,
+      temperature=0.0,
+      num_beams=3,
+      early_stopping=True,
+  )
+  return tokenizer.decode(generated_ids[0], skip_special_tokens=True)
+# Example usage:
+context = (
+  "CREATE TABLE customers (id INT PRIMARY KEY, name VARCHAR(100), country VARCHAR(50)); "
+  "CREATE TABLE orders (order_id INT PRIMARY KEY, customer_id INT, total_amount DECIMAL(10,2), "
+  "order_date DATE, FOREIGN KEY (customer_id) REFERENCES customers(id)); "
+  "INSERT INTO customers (id, name, country) VALUES (1, 'Alice', 'USA'), (2, 'Bob', 'UK'), "
+  "(3, 'Charlie', 'Canada'), (4, 'David', 'USA'); "
+  "INSERT INTO orders (order_id, customer_id, total_amount, order_date) VALUES "
+  "(101, 1, 500, '2024-01-15'), (102, 2, 300, '2024-01-20'), "
+  "(103, 1, 700, '2024-02-10'), (104, 3, 450, '2024-02-15'), "
+  "(105, 4, 900, '2024-03-05');"
+)
+query = (
+  "Retrieve the total order amount for each customer, showing only customers from the USA, "
+  "and sort the result by total order amount in descending order."
+)
+# Construct the prompt
+sample_prompt = f"""Context:
+{context}
+Query:
+{query}
+Response:
+"""
+logger.info("Running inference with beam search decoding.")
+generated_sql = run_inference(sample_prompt)
+print("Prompt:")
+print("Context:")
+print(context)
+print("\nQuery:")
+print(query)
+print("\nResponse:")
+print(generated_sql)
+# Expected Output:
+# SELECT customers.name, SUM(orders.total_amount) as total_amount FROM customers
+# INNER JOIN orders ON customers.id = orders.customer_id
+# WHERE customers.country = 'USA'
+# GROUP BY customers.name
+# ORDER BY total_amount DESC;
+```
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+```bibtex
+@misc{aarohanverma_text2sql_2025,
+title={Text-to-SQL Fine-Tuned Model (Flan-T5 Base)},
+author={Aarohan Verma},
+year={2025},
+url={https://huggingface.co/aarohanverma/text2sql-flan-t5-base-qlora-finetuned}
+}
+```
+## Model Card Contact
+For inquiries or further information, please contact:
+LinkedIn: https://www.linkedin.com/in/aarohanverma/
+Email: [email protected]