Taking some time to generate results
The model is taking around 10 mins. to generate results (using original sample schema provided).
Does it support multiple tables ?
Hi
It does support multiple tables. I have updated the prompt in the model card.
Yes, speed might be slow if you are on CPU. You should have GPU to run this model in seconds as this model is in full-precision and not quantized.
If you are looking for more powerful model and want to run in CPU, use my this model "https://huggingface.co/ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16". Its more powerful and usually takes 20-30 Seconds on CPU
Hit me up if you face any issues
Thanks
Hey! Thanks buddy, the model link you provided is now taking around 1.5mins to generate.
I am using a virtual machine with 12 CPUs & 64Gb ram. Can you provided hardware requirements(that you are using) for both the models.
If possible can we have a session, for better understanding whenever you are available.
Hi Following Up....
Hi sorry for delayed response as I was busy. Yes, Defog_llama3 takes less time and gives out more accurate results. But you can further drastically reduce response time to seconds by instructing the model to only output just SQL query and no explanation.
The infra you are using is perfect and it should give out response in seconds
On the same machine Defog_llama3 model is taking 5mins. I am not able to pin-point the exact issue here.
If possible can we have a session, for better understanding whenever you are available.
Hi
Can you share your code snippet? I will suggest changes. Also mention your cpu specifications
I am using a 16 core CPU & 64 gb RAM linux virtual machine. Below is the code that i am using. (I have already downloaded the model and stored it, it is working fine but taking 5mins to generate response.)
** I have placed a '---' before the commented code .
import ctranslate2
import transformers
from huggingface_hub import snapshot_download
---#model_id = "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16"
local_model_path = "./local_model"
---# Initialize the model and tokenizer from local paths
model = ctranslate2.Generator(local_model_path)
tokenizer = transformers.AutoTokenizer.from_pretrained(local_model_path,local_files_only=True)
---# model_path = snapshot_download(model_id)
---# print('------------------------------',model_path,'-----------------------------')
---# model = ctranslate2.Generator(model_path)
---# tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
prompt="""
CREATE TABLE chapter
(
chapter_id
int NOT NULL AUTO_INCREMENT,
chapter_name
varchar(300) NOT NULL,
class_id
int DEFAULT NULL,
subject_id
int DEFAULT NULL,
curriculum_id
int DEFAULT NULL,
PRIMARY KEY (chapter_id
),
KEY class_id
(class_id
),
KEY subject_id
(subject_id
),
KEY curriculum_id_idx
(curriculum_id
),
CONSTRAINT chapter_ibfk_1
FOREIGN KEY (class_id
) REFERENCES class
(class_id
),
CONSTRAINT chapter_ibfk_2
FOREIGN KEY (subject_id
) REFERENCES subject
(subject_id
),
CONSTRAINT curriculum_id
FOREIGN KEY (curriculum_id
) REFERENCES curriculum
(curriculum_id
)
)
CREATE TABLE class
(
class_id
int NOT NULL AUTO_INCREMENT,
class_name
varchar(20) DEFAULT NULL,
curriculum_id
int DEFAULT NULL,
PRIMARY KEY (class_id
),
UNIQUE KEY class_name
(class_name
),
KEY curriculum_id
(curriculum_id
),
CONSTRAINT class_ibfk_1
FOREIGN KEY (curriculum_id
) REFERENCES curriculum
(curriculum_id
)
)
CREATE TABLE content
(
id
int NOT NULL AUTO_INCREMENT,
teaching_type
tinyint(1) NOT NULL,
subjective_type
tinyint(1) NOT NULL,
objective_type
tinyint(1) NOT NULL,
title
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
description
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
marks_assigned
int DEFAULT NULL,
option1
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
option2
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
option3
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
option4
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
answer
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
chapter_id
int NOT NULL,
class_name_id
int NOT NULL,
curriculum_id
int NOT NULL,
subject_id
int NOT NULL,
PRIMARY KEY (id
),
KEY content_class_name_id_0bf9f7f2_fk_reference_classes_id
(class_name_id
),
KEY content_curriculum_id_f2f5d8c8_fk_curriculum_curr_id
(curriculum_id
),
KEY content_subject_id_d311def1_fk_subjects_subject_id
(subject_id
),
KEY content_chapter_id_92994b0c_fk_chapter_id
(chapter_id
)
)
SELECT * FROM ai_tutor.content;CREATE TABLE curriculum
(
curriculum_id
int NOT NULL AUTO_INCREMENT,
curriculum_name
varchar(200) NOT NULL,
country
varchar(100) NOT NULL,
curr_class_prefix
varchar(100) NOT NULL,
curr_section_prefix
varchar(100) NOT NULL,
PRIMARY KEY (curriculum_id
)
)
CREATE TABLE student_chapter_progress
(
s_chapter_progress_id
int NOT NULL AUTO_INCREMENT,
student_id
int NOT NULL,
chapter_id
int NOT NULL,
subject_id
int DEFAULT NULL,
progress
int NOT NULL,
PRIMARY KEY (s_chapter_progress_id
),
KEY student_id_idx
(student_id
),
KEY chapter_id_idx
(chapter_id
),
KEY subject_id_idx
(subject_id
),
CONSTRAINT fk_chapter_id_chapter_progress
FOREIGN KEY (chapter_id
) REFERENCES chapter
(chapter_id
),
CONSTRAINT fk_student_id_chapter_progress
FOREIGN KEY (student_id
) REFERENCES students
(student_id
),
CONSTRAINT fk_subject_id_chapter_progress
FOREIGN KEY (subject_id
) REFERENCES subject
(subject_id
)
)
CREATE TABLE student_quiz_analytics
(
student_quiz_id
int NOT NULL AUTO_INCREMENT,
student_email_id
varchar(100) NOT NULL,
quiz_chapter_id
int NOT NULL,
time_taken
int DEFAULT NULL,
score
int DEFAULT NULL,
num_attempts
int DEFAULT NULL,
performance
varchar(45) DEFAULT NULL,
PRIMARY KEY (student_quiz_id
),
KEY quiz_chapter_id_idx
(quiz_chapter_id
),
CONSTRAINT quiz_chapter_id
FOREIGN KEY (quiz_chapter_id
) REFERENCES chapter
(chapter_id
)
)
CREATE TABLE student_reward
(
s_reward_id
int NOT NULL AUTO_INCREMENT,
student_id
int NOT NULL,
reward_points
int DEFAULT NULL,
PRIMARY KEY (s_reward_id
),
KEY student_id
(student_id
),
CONSTRAINT student_id
FOREIGN KEY (student_id
) REFERENCES students
(student_id
)
)
CREATE TABLE student_subject_map
(
student_sub_id
int NOT NULL AUTO_INCREMENT,
student_id
int DEFAULT NULL,
subject_id
int DEFAULT NULL,
progress
int DEFAULT NULL,
PRIMARY KEY (student_sub_id
),
KEY student_id
(student_id
),
KEY subject_id
(subject_id
),
CONSTRAINT student_subject_map_ibfk_1
FOREIGN KEY (student_id
) REFERENCES students
(student_id
),
CONSTRAINT student_subject_map_ibfk_2
FOREIGN KEY (subject_id
) REFERENCES subject
(subject_id
)
)
CREATE TABLE student_subtopic_status
(
s_subtopic_status_id
int NOT NULL AUTO_INCREMENT,
student_id
int NOT NULL,
subtopic_id
int NOT NULL,
status
int DEFAULT NULL,
PRIMARY KEY (s_subtopic_status_id
)
)
CREATE TABLE student_topic_progress
(
s_topic_progress_id
int NOT NULL AUTO_INCREMENT,
student_id
int NOT NULL,
chapter_id
int DEFAULT NULL,
content_id
int NOT NULL,
progress
int DEFAULT NULL,
PRIMARY KEY (s_topic_progress_id
),
KEY student_id_idx
(student_id
),
KEY chapter_id_idx
(chapter_id
),
KEY fk_content_id
(content_id
),
CONSTRAINT fk_chapter_id
FOREIGN KEY (chapter_id
) REFERENCES chapter
(chapter_id
),
CONSTRAINT fk_content_id
FOREIGN KEY (content_id
) REFERENCES content
(id
),
CONSTRAINT fk_student_id
FOREIGN KEY (student_id
) REFERENCES students
(student_id
)
)
CREATE TABLE students
(
student_id
int NOT NULL AUTO_INCREMENT,
name
varchar(200) NOT NULL,
date_of_birth
datetime DEFAULT NULL,
age
int DEFAULT NULL,
class_id
int DEFAULT NULL,
guardian_name
varchar(200) DEFAULT NULL,
guardian_email
varchar(100) DEFAULT NULL,
guardian_approval
varchar(10) DEFAULT NULL,
email
varchar(100) DEFAULT NULL,
mobile_number
varchar(15) DEFAULT NULL,
gender
varchar(45) NOT NULL,
alternate_mobile_number
varchar(15) DEFAULT NULL,
school_name
varchar(200) DEFAULT NULL,
curriculum_id
varchar(200) DEFAULT NULL,
address
varchar(300) NOT NULL,
zipcode
varchar(100) NOT NULL,
city
varchar(100) NOT NULL,
state
varchar(200) NOT NULL,
country
varchar(100) NOT NULL,
created_at
datetime DEFAULT NULL,
status
varchar(45) DEFAULT NULL,
password
varchar(200) NOT NULL,
picture_data
json DEFAULT NULL,
PRIMARY KEY (student_id
),
UNIQUE KEY email
(email
),
UNIQUE KEY mobile_number
(mobile_number
),
UNIQUE KEY alternate_mobile_number
(alternate_mobile_number
),
KEY class_id
(class_id
),
CONSTRAINT students_ibfk_1
FOREIGN KEY (class_id
) REFERENCES class
(class_id
)
)
CREATE TABLE sub_topics
(
sub_topic_id
int NOT NULL AUTO_INCREMENT,
content_id
int NOT NULL,
sub_topic
varchar(150) DEFAULT NULL,
description
longtext,
PRIMARY KEY (sub_topic_id
),
KEY content_id
(content_id
),
CONSTRAINT content_id
FOREIGN KEY (content_id
) REFERENCES content
(id
)
)
-- Using valid SQL, answer the following questions for the tables provided above.
-- how many subtopics are in biology ? (Generate 1 Sql query. No explaination needed)
answer:
"""
messages = [
{"role": "system", "content": "You are SQL Expert. Given a input question and schema, answer with correct sql query"},
{"role": "user", "content": prompt},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_ids))
---# results = model.generate_batch([input_tokens], include_prompt_in_result=False, max_length=256, sampling_temperature=0, sampling_topp=0.1, end_token=terminators)
results = model.generate_batch([input_tokens], include_prompt_in_result=False, max_length=256, sampling_temperature=0, end_token=terminators)
output = tokenizer.decode(results[0].sequences_ids[0])
print(output)
Yeah I can see you are doing model.generate_batch twice which is why it's taking twice as long to generate response
Please use like below:
results = model.generate_batch([input_tokens], include_prompt_in_result=False, max_length=256, sampling_temperature=0.6, sampling_topp=0.9, end_token=terminators)
output = tokenizer.decode(results[0].sequences_ids[0])
print(output)
No no the first one is commented....
Just now ran your code ...got answer in 40 seconds on cpu 12 core 64 gb ram
Please check once if you are using the code properly
Once again I am pasting the code for you. Please just change model path to your local path and prompt. Don't change anything else and run
import ctranslate2
import transformers
from huggingface_hub import snapshot_download
model_id = "ByteForge/Defog_llama-3-sqlcoder-8b-ct2-int8_float16"
model_path = snapshot_download(model_id)
model = ctranslate2.Generator(model_path)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
prompt="""
CREATE TABLE stadium (
stadium_id number,
location text,
name text,
capacity number,
highest number,
lowest number,
average number
)
CREATE TABLE singer (
singer_id number,
name text,
country text,
song_name text,
song_release_year text,
age number,
is_male others
)
CREATE TABLE concert (
concert_id number,
concert_name text,
theme text,
stadium_id text,
year text
)
CREATE TABLE singer_in_concert (
concert_id number,
singer_id text
)
-- Using valid SQLite, answer the following questions for the tables provided above.
-- What is the maximum, the average, and the minimum capacity of stadiums ? (Generate 1 Sql query. No explaination needed)
answer:
"""
messages = [
{"role": "system", "content": "You are SQL Expert. Given a input question and schema, answer with correct sql query"},
{"role": "user", "content": prompt},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_ids))
results = model.generate_batch([input_tokens], include_prompt_in_result=False, max_length=256, sampling_temperature=0.6, sampling_topp=0.9, end_token=terminators)
output = tokenizer.decode(results[0].sequences_ids[0])
print(output)