|
--- |
|
license: mit |
|
datasets: |
|
- gretelai/synthetic_text_to_sql |
|
language: |
|
- en |
|
inference: |
|
parameters: |
|
do_sample: False |
|
max_new_tokens: 250 |
|
temperature: 0.7 |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
--- |
|
# Gemma 2B Fine-Tuned SQL Generator |
|
|
|
## Introduction |
|
The Gemma 2B SQL Generator is a specialized version of the Gemma 2B model, fine-tuned to generate SQL queries based on a given SQL context. This model has been tailored to assist developers and analysts in generating accurate SQL queries automatically, enhancing productivity and reducing the scope for errors. |
|
|
|
## Model Details |
|
- **Model Type:** Gemma 2B |
|
- **Fine-Tuning Details:** The model was fine-tuned specifically for generating SQL queries. |
|
- **Training Loss:** Achieved a training loss of 0.3, indicating a high level of accuracy in SQL query generation. |
|
|
|
## Installation |
|
To set up the necessary environment for using the SQL Generator, run the following commands: |
|
```bash |
|
pip install torch torch |
|
pip install transformers |
|
``` |
|
|
|
## how to Fine Tune |
|
here is the github link [click here](https://github.com/theSuriya/Gemma-SQL-Generator/tree/main) |
|
|
|
## Inference |
|
|
|
```python |
|
|
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("suriya7/Gemma2B-Finetuned-Sql-Generator") |
|
model = AutoModelForCausalLM.from_pretrained("suriya7/Gemma2B-Finetuned-Sql-Generator") |
|
|
|
prompt_template = """ |
|
<start_of_turn>user |
|
You are an intelligent AI specialized in generating SQL queries. |
|
Your task is to assist users in formulating SQL queries to retrieve specific information from a database. |
|
Please provide the SQL query corresponding to the given prompt and context: |
|
|
|
Prompt: |
|
find the price of laptop |
|
|
|
Context: |
|
CREATE TABLE products ( |
|
product_id INT, |
|
product_name VARCHAR(100), |
|
category VARCHAR(50), |
|
price DECIMAL(10, 2), |
|
stock_quantity INT |
|
); |
|
|
|
INSERT INTO products (product_id, product_name, category, price, stock_quantity) |
|
VALUES |
|
(1, 'Smartphone', 'Electronics', 599.99, 100), |
|
(2, 'Laptop', 'Electronics', 999.99, 50), |
|
(3, 'Headphones', 'Electronics', 99.99, 200), |
|
(4, 'T-shirt', 'Apparel', 19.99, 300), |
|
(5, 'Jeans', 'Apparel', 49.99, 150);<end_of_turn> |
|
<start_of_turn>model |
|
""" |
|
|
|
prompt = prompt_template |
|
encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True).input_ids |
|
|
|
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") |
|
model.to(device) |
|
inputs = encodeds.to(device) |
|
|
|
|
|
# Increase max_new_tokens if needed |
|
generated_ids = model.generate(inputs, max_new_tokens=1000, do_sample=True, temperature = 0.7,pad_token_id=tokenizer.eos_token_id) |
|
ans = '' |
|
for i in tokenizer.decode(generated_ids[0], skip_special_tokens=True).split('<end_of_turn>')[:2]: |
|
ans += i |
|
|
|
# Extract only the model's answer |
|
model_answer = ans.split("model")[1].strip() |
|
print(model_answer) |
|
``` |