How to Use
import torch
from transformers import T5ForConditionalGeneration, AutoTokenizer
device = torch.device("cuda:0")
tokenizer = AutoTokenizer.from_pretrained("LarkAI/codet5p-770m_nl2sql_oig")
model = T5ForConditionalGeneration.from_pretrained("LarkAI/codet5p-770m_nl2sql_oig").to(device)
text = "Given the following schema:\ntrack (Track_ID, Name, Location, Seating, Year_Opened)\nrace (Race_ID, Name, Class, Date, Track_ID)\nWrite a SQL query to count the number of tracks."
inputs = tokenizer.encode(text, return_tensors="pt").to(device)
output_ids = model.generate(inputs, max_length=512)
response_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
# SELECT COUNT( * ) FROM track
How to Train
Dataset:
- https://huggingface.co/datasets/laion/OIG#unified_sqlv1jsonl-17000
- https://huggingface.co/datasets/laion/OIG#unified_sqlv2jsonl24000
{
"text":"<human>: Given the following schema:\nlocation (restaurant_id, house_number, street_name, city_name)\nrestaurant (id, name, food_type, city_name, rating)\ngeographic (city_name, county, region)\nWrite a SQL query to give me some good arabic -s on buchanan in san francisco ?\n<bot>: SELECT location.house_number , restaurant.name FROM location , restaurant WHERE location.city_name = \"san francisco\" AND location.street_name = \"buchanan\" AND restaurant.food_type = \"arabic\" AND restaurant.id = location.restaurant_id AND restaurant.rating > 2.5 ;",
"metadata":{
"source":"unified_sqlv1"
}
}
- Downloads last month
- 107
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.