Model Card

This model is a fine-tuned version of Flan-T5 Base optimized to convert natural language queries into SQL statements. It leverages QLoRA (Quantized Low-Rank Adaptation) with PEFT for efficient adaptation and has been trained on a concatenation of several high-quality text-to-SQL datasets. A live demo is available, and users can clone and run inference directly from Hugging Face.

Model Details

Model Description

This model is designed to generate SQL queries based on a provided natural language context and query. It has been fine-tuned using QLoRA with 4-bit quantization and PEFT on a diverse text-to-SQL dataset. The model demonstrates significant improvements over the original base model, making it highly suitable for practical text-to-SQL applications.

  • Developed by: Aarohan Verma
  • Model type: Seq2Seq / Text-to-Text Generation (SQL Generation)
  • Language(s) (NLP): English
  • License: Apache-2.0
  • Finetuned from model: google/flan-t5-base

Model Sources

Uses

Direct Use

This model can be used directly for generating SQL queries from natural language inputs. It is particularly useful for applications in database querying and natural language interfaces for relational databases.

Downstream Use

The model can be further integrated into applications such as chatbots, data analytics platforms, and business intelligence tools to automate query generation.

Out-of-Scope Use

This model is not designed for tasks outside text-to-SQL generation. It may not perform well for non-SQL language generation or queries outside the domain of structured data retrieval.

Risks and Limitations

  • Risks: Inaccurate SQL generation may lead to unexpected query behavior, especially in safety-critical environments.
  • Limitations: The model may struggle with highly complex schemas and queries due to limited training data and inherent model capability constraints, particularly for tasks that require deep, domain-specific knowledge.

Recommendations

Users should validate the generated SQL queries before deployment in production systems. Consider incorporating human-in-the-loop review for critical applications.

How to Get Started with the Model

To get started, clone the repository or download the model from Hugging Face, then use the provided example code to run inference. Detailed instructions and the live demo are available in this model card.

Training Details

Training Data

The model was fine-tuned on a concatenation of several publicly available text-to-SQL datasets:

  1. Clinton/Text-to-SQL v1
  2. b-mc2/sql-create-context
  3. gretelai/synthetic_text_to_sql
  4. knowrohit07/know_sql

Data Split:

Split Percentage Number of Samples
Training 85% 338,708
Validation 5% 19,925
Testing 10% 39,848

Training Procedure

Preprocessing

The raw data was preprocessed as follows:

  • Cleaning: Removal of extra whitespaces/newlines and standardization of columns (renaming to query, context, and response).
  • Filtering: Dropping examples with missing values and duplicates; retaining only rows where the prompt is ≀ 500 tokens and the response is ≀ 250 tokens.
  • Tokenization:

Prompts are constructed in the format:

Context:
{context}

Query:
{query}

Response:

and tokenized with a maximum length of 512 for inputs and 256 for responses using google/flan-t5-base's tokenizer.

Training Hyperparameters

  • Epochs: 6
  • Batch Sizes:
    Training: 64 per device
    Evaluation: 64 per device
  • Gradient Accumulation: 2 steps
  • Learning Rate: 2e-4
  • Optimizer: adamw_bnb_8bit (memory-efficient variant of AdamW)
  • LR Scheduler: Cosine scheduler with a warmup ratio of 10%
  • Quantization: 4-bit NF4 (with double quantization) using torch.bfloat16
  • LoRA Parameters:
    Rank (r): 32
    Alpha: 64
    Dropout: 0.1
    Target Modules: ["q", "v"]
  • Checkpointing:
    Model saved at the end of every epoch
    Early stopping with a patience of 2 epochs based on evaluation loss
  • Reproducibility: Random seeds are set across Python, NumPy, and PyTorch (seed = 42)

Evaluation

The model was evaluated on 39,848 test samples, representing 10% of the dataset.

Metrics

Evaluation metrics used:

  • ROUGE: Measures n-gram overlap between generated and reference SQL.
  • BLEU: Assesses the quality of translation from natural language to SQL.
  • Fuzzy Match Score: Uses token-set similarity to provide a soft match percentage.
  • Exact Match Accuracy: Percentage of queries that exactly match the reference SQL.

Results

The table below summarizes the evaluation metrics comparing the original base model with the fine-tuned model:

Metric Original Model Fine-Tuned Model Comments
ROUGE-1 0.05647 0.75388 Over 13Γ— increase; indicates much better content capture.
ROUGE-2 0.01563 0.61039 Nearly 39Γ— improvement; higher n-gram quality.
ROUGE-L 0.05031 0.72628 More than 14Γ— increase; improved sequence similarity.
BLEU Score 0.00314 0.47198 Approximately 150Γ— increase; demonstrates significant fluency gains.
Fuzzy Match Score 13.98% 85.62% Substantial improvement; generated SQL aligns much closer with human responses.
Exact Match Accuracy 0.00% 18.29% Non-zero accuracy achieved; critical for production-readiness.

Summary

The fine-tuned model shows dramatic improvements across all evaluation metrics, proving its effectiveness in generating accurate and relevant SQL queries from natural language inputs.

πŸ” Inference & Example Usage

Inference Code

Below is the recommended Python code for running inference on the fine-tuned model:

import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import logging

# Set up logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# Ensure device is set (GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the fine-tuned model and tokenizer
model_name = "aarohanverma/text2sql-flan-t5-base-qlora-finetuned"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
tokenizer = AutoTokenizer.from_pretrained("aarohanverma/text2sql-flan-t5-base-qlora-finetuned")

# Ensure decoder start token is set
if model.config.decoder_start_token_id is None:
    model.config.decoder_start_token_id = tokenizer.pad_token_id

def run_inference(prompt_text: str) -> str:
    """
    Runs inference on the fine-tuned model using beam search with fixes for repetition.
    """
    inputs = tokenizer(prompt_text, return_tensors="pt", truncation=True, max_length=512).to(device)

    generated_ids = model.generate(
        input_ids=inputs["input_ids"],
        decoder_start_token_id=model.config.decoder_start_token_id, 
        max_new_tokens=100,  
        temperature=0.1,  
        num_beams=5,  
        repetition_penalty=1.2,  
        early_stopping=True, 
    )

    generated_sql = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

    generated_sql = generated_sql.split(";")[0] + ";"  # Keep only the first valid SQL query

    return generated_sql

# Example usage:
context = (
    "CREATE TABLE students (id INT PRIMARY KEY, name VARCHAR(100), age INT, grade CHAR(1)); "
    "INSERT INTO students (id, name, age, grade) VALUES "
    "(1, 'Alice', 14, 'A'), (2, 'Bob', 15, 'B'), "
    "(3, 'Charlie', 14, 'A'), (4, 'David', 16, 'C'), (5, 'Eve', 15, 'B');"
)

query = ("Retrieve the names of students who are 15 years old.")


# Construct the prompt
sample_prompt = f"""Context:
{context}

Query:
{query}

Response:
"""

logger.info("Running inference with beam search decoding.")
generated_sql = run_inference(sample_prompt)

print("Prompt:")
print("Context:")
print(context)
print("\nQuery:")
print(query)
print("\nResponse:")
print(generated_sql)

# EXPECTED RESPONSE: SELECT name FROM students WHERE age = 15; 

Citation

BibTeX:

@misc {aarohan_verma_2025,
    author       = { {Aarohan Verma} },
    title        = { text2sql-flan-t5-base-qlora-finetuned },
    year         = 2025,
    url          = { https://huggingface.co/aarohanverma/text2sql-flan-t5-base-qlora-finetuned },
    doi          = { 10.57967/hf/4887 },
    publisher    = { Hugging Face }
}

Model Card Contact

For inquiries or further information, please contact:

LinkedIn: https://www.linkedin.com/in/aarohanverma/

Email: [email protected]

Downloads last month
50
Safetensors
Model size
248M params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for aarohanverma/text2sql-flan-t5-base-qlora-finetuned

Finetuned
(681)
this model

Datasets used to train aarohanverma/text2sql-flan-t5-base-qlora-finetuned

Space using aarohanverma/text2sql-flan-t5-base-qlora-finetuned 1