File size: 5,543 Bytes
8ab2d00 c125a1e 8ab2d00 c125a1e 783d47e c125a1e f957795 8ab2d00 81230bf 8ab2d00 81230bf 8ab2d00 0f028c0 8ab2d00 c69135d 80050e7 7b80c28 c69135d 8ab2d00 81230bf 3b828b3 81230bf 053207b 81230bf c125a1e 81230bf 9bc46b3 81230bf 813eddf 81230bf 8ab2d00 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
---
base_model: mistralai/Mistral-7B-Instruct-v0.3
datasets:
- generator
library_name: peft
license: apache-2.0
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: Mistral-7B-text-to-sql-flash-attention-2-dataeval
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Mistral-7B-text-to-sql-flash-attention-2-dataeval
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the generator dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4605
Perplexity of 10.40
Perplexity: Perplexity is a measure of how uncertain or surprised the model is about its predictions.
It's derived from the probabilities the model assigns to different words or tokens.
Perplexity Article: https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
https://medium.com/@AyushmanPranav/perplexity-calculation-in-nlp-0699fbda4594
The perplexity of 10.40 achieved on the dataset indicates that the fine-tuned Mistral-7B model reasonably understands natural language and SQL syntax.
However, further evaluation using task-specific metrics is necessary to assess the model's effectiveness in real-world scenarios.
By combining quantitative metrics like perplexity with qualitative analysis of generated queries,
we can comprehensively understand the model's strengths and weaknesses, ultimately
leading to improved performance and more reliable text-to-SQL translation capabilities.
Dataset : [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
## Model description
Article: https://medium.com/@frankmorales_91352/fine-tuning-the-llm-mistral-7b-instruct-v0-3-249c1814ceaf
## Training and evaluation data
Fine Tuning and Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Mistral_7B_Instruct_v0_1_for_text_to_SQL_EVALDATA.ipynb
Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/Evaluator_Mistral_7B_text_to_sql.ipynb
Evaluation article with Chromadb: https://medium.com/@frankmorales_91352/a-comprehensive-evaluation-of-a-fine-tuned-text-to-sql-model-from-code-to-results-with-7ea59943b0a1
Evaluation article with Chromadb, PostgreSQL and the “gretelai/synthetic_text_to_sql” dataset:
https://medium.com/@frankmorales_91352/evaluating-the-performance-of-a-fine-tuned-text-to-sql-model-6b7d61dcfef5
The article discusses evaluating this fine-tuned text-to-SQL model, a type of artificial intelligence
that translates natural language into SQL queries.
The model was trained on the "b-mc2/sql-create-context" dataset and
evaluated using the "gretelai/synthetic_text_to_sql" dataset.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 3
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.03
- lr_scheduler_warmup_steps: 15
- num_epochs: 3
from transformers import TrainingArguments
args = TrainingArguments(
output_dir="Mistral-7B-text-to-sql-flash-attention-2-dataeval",
num_train_epochs=3, # number of training epochs
per_device_train_batch_size=3, # batch size per device during training
gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass
gradient_checkpointing=True, # use gradient checkpointing to save memory
optim="adamw_torch_fused", # use fused adamw optimizer
logging_steps=10, # log every ten steps
#save_strategy="epoch", # save checkpoint every epoch
learning_rate=2e-4, # learning rate, based on QLoRA paper
bf16=True, # use bfloat16 precision
tf32=True, # use tf32 precision
max_grad_norm=0.3, # max gradient norm based on QLoRA paper
warmup_ratio=0.03, # warmup ratio based on QLoRA paper
weight_decay=0.01,
lr_scheduler_type="constant", # use constant learning rate scheduler
push_to_hub=True, # push model to hub
report_to="tensorboard", # report metrics to tensorboard
hub_token=access_token_write, # Add this line
load_best_model_at_end=True,
logging_dir="/content/drive/MyDrive/model/Mistral-7B-text-to-sql-flash-attention-2-dataeval/logs",
evaluation_strategy="steps",
eval_steps=10,
save_strategy="steps",
save_steps=10,
metric_for_best_model = "loss",
warmup_steps=15,
)
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 1.8612 | 0.4020 | 10 | 0.6092 |
| 0.5849 | 0.8040 | 20 | 0.5307 |
| 0.4937 | 1.2060 | 30 | 0.4887 |
| 0.4454 | 1.6080 | 40 | 0.4670 |
| 0.425 | 2.0101 | 50 | 0.4544 |
| 0.3498 | 2.4121 | 60 | 0.4717 |
| 0.3439 | 2.8141 | 70 | 0.4605 |
### Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1 |