File size: 3,454 Bytes
7308e37 7228bc6 d8baa87 7308e37 7228bc6 7308e37 1686712 7308e37 1686712 7308e37 1686712 59dbaf0 1686712 59dbaf0 1686712 333eadc 1686712 333eadc 1686712 333eadc 1686712 333eadc 1686712 333eadc 1686712 333eadc 1686712 333eadc 1686712 333eadc 1686712 333eadc 1686712 7308e37 59dbaf0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
license: apache-2.0
library_name: transformers
---
# Mistral-7B-Instruct-SQL-ian
## About the Model
<!-- Provide a longer summary of what this model is. -->
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3. https://huggingface.co/datasets/gretelai/synthetic_text_to_sql
- **Model Name:** Mistral-7B-Instruct-SQL-ian
- **Developed by:** kubwa
- **Base Model Name:** mistralai/Mistral-7B-Instruct-v0.3
- **Base Model URL:** [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
- **Base Model Description:** The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.
Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2
- Extended vocabulary to 32768
- Supports v3 Tokenizer
- Supports function calling
- **Dataset Name:** gretelai/synthetic_text_to_sql
- **Dataset URL:** [synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)
- **Dataset Description:** gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0.
## Prompt Template
```
<s>
### Instruction:
{question}
### Context:
{schema}
### Response:
```
## How to Use it
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained("kubwa/Mistral-7B-Instruct-SQL-ian")
tokenizer = AutoTokenizer.from_pretrained("kubwa/Mistral-7B-Instruct-SQL-ian",use_fast=False)
text = """<s>
### Instruction:
What is the total volume of timber sold by each salesperson, sorted by salesperson?
### Context:
CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'John Doe', 'North'), (2, 'Jane Smith', 'South'); CREATE TABLE timber_sales (sales_id INT, salesperson_id INT, volume REAL, sale_date DATE); INSERT INTO timber_sales (sales_id, salesperson_id, volume, sale_date) VALUES (1, 1, 120, '2021-01-01'), (2, 1, 150, '2021-02-01'), (3, 2, 180, '2021-01-01');
### Response:
"""
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
inputs = tokenizer(text, return_tensors="pt")
inputs = {key: value.to(device) for key, value in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=300, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Example Output
```
### Instruction:
What is the total volume of timber sold by each salesperson, sorted by salesperson?
### Context:
CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'John Doe', 'North'), (2, 'Jane Smith', 'South'); CREATE TABLE timber_sales (sales_id INT, salesperson_id INT, volume REAL, sale_date DATE); INSERT INTO timber_sales (sales_id, salesperson_id, volume, sale_date) VALUES (1, 1, 120, '2021-01-01'), (2, 1, 150, '2021-02-01'), (3, 2, 180, '2021-01-01');
### Response:
SELECT salesperson.name, SUM(timber_sales.volume) as total_volume FROM salesperson JOIN timber_sales ON salesperson.salesperson_id = timber_sales.salesperson_id GROUP BY salesperson.name ORDER BY total_volume DESC;
```
## Hardware and Software
- **Training Hardware:** 4 Tesla V100-PCIE-32GB GPUs
## License
- Apache-2.0 |