File size: 3,323 Bytes
899ede3 bcb09f3 899ede3 b0751c5 899ede3 b0751c5 899ede3 29c32e7 b0751c5 899ede3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
license: apache-2.0
library_name: peft
tags:
- trl
- sft
- generated_from_trainer
base_model: meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- b-mc2/sql-create-context
model-index:
- name: llama3-8b-instruct-text-to-sql
results: []
metrics:
- accuracy 79.90
language:
- en
---
# llama3-8b-instruct-text-to-sql
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the generator dataset.
## Training and evaluation data
b-mc2/sql-create-context
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 3
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 3
### Framework versions
- PEFT 0.10.0
- Transformers 4.40.0
- Pytorch 2.2.0+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1
### Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "ByteForge/Llama_3_8b_Instruct_Text2Sql_FullPrecision_Finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt="""
CREATE TABLE stadium (
stadium_id number,
location text,
name text,
capacity number,
highest number,
lowest number,
average number
)
CREATE TABLE singer (
singer_id number,
name text,
country text,
song_name text,
song_release_year text,
age number,
is_male others
)
CREATE TABLE concert (
concert_id number,
concert_name text,
theme text,
stadium_id text,
year text
)
CREATE TABLE singer_in_concert (
concert_id number,
singer_id text
)
-- Using valid SQLite, answer the following questions for the tables provided above.
-- What is the maximum, the average, and the minimum capacity of stadiums ? (Generate 1 Sql query. No explaination needed)
answer:
"""
messages = [
{"role": "system", "content": "You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA."},
{"role": "user", "content": prompt},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0]
print(tokenizer.decode(response, skip_special_tokens=True))
#
#system
#You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
#SCHEMA:
#CREATE TABLE match_season (College VARCHAR, POSITION VARCHAR)
#user
#Which college have both players with position midfielder and players with position defender?
#assistant
#SELECT College FROM match_season WHERE POSITION = "Midfielder" INTERSECT SELECT College FROM match_season WHERE POSITION = "Defender"
#
``` |