frankmorales2020's picture
Update README.md
4b47e25 verified
|
raw
history blame
3.83 kB
metadata
base_model: mistralai/Mistral-7B-Instruct-v0.3
datasets:
  - generator
library_name: peft
license: apache-2.0
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: Mistral-7B-text-to-sql-flash-attention-2-dataeval
    results: []

Mistral-7B-text-to-sql-flash-attention-2-dataeval

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4605

Model description

Article: https://medium.com/@frankmorales_91352/fine-tuning-the-llm-mistral-7b-instruct-v0-3-249c1814ceaf

Training and evaluation data

Fine Tuning and Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Mistral_7B_Instruct_v0_1_for_text_to_SQL_EVALDATA.ipynb

Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/Evaluator_Mistral_7B_text_to_sql.ipynb

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 3
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.03
  • lr_scheduler_warmup_steps: 15
  • num_epochs: 3

from transformers import TrainingArguments args = TrainingArguments( output_dir="Mistral-7B-text-to-sql-flash-attention-2-dataeval",
num_train_epochs=3, # number of training epochs per_device_train_batch_size=3, # batch size per device during training gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass gradient_checkpointing=True, # use gradient checkpointing to save memory optim="adamw_torch_fused", # use fused adamw optimizer logging_steps=10, # log every 10 steps #save_strategy="epoch", # save checkpoint every epoch learning_rate=2e-4, # learning rate, based on QLoRA paper bf16=True, # use bfloat16 precision tf32=True, # use tf32 precision max_grad_norm=0.3, # max gradient norm based on QLoRA paper warmup_ratio=0.03, # warmup ratio based on QLoRA paper weight_decay=0.01, lr_scheduler_type="constant", # use constant learning rate scheduler push_to_hub=True, # push model to hub report_to="tensorboard", # report metrics to tensorboard hub_token=access_token_write, # Add this line load_best_model_at_end=True, logging_dir="/content/gdrive/MyDrive/model/Mistral-7B-text-to-sql-flash-attention-2-dataeval/logs", evaluation_strategy="steps", eval_steps=10, save_strategy="steps", save_steps=10, metric_for_best_model = "loss", warmup_steps=15, )

Training results

Training Loss Epoch Step Validation Loss
1.8612 0.4020 10 0.6092
0.5849 0.8040 20 0.5307
0.4937 1.2060 30 0.4887
0.4454 1.6080 40 0.4670
0.425 2.0101 50 0.4544
0.3498 2.4121 60 0.4717
0.3439 2.8141 70 0.4605

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1