frankmorales2020
/

Mistral-7B-text-to-sql-flash-attention-2-dataeval

Generated from Trainer

Model card Files Files and versions Community

frankmorales2020 commited on Jun 25, 2024

Commit

81230bf

·

verified ·

1 Parent(s): 8ab2d00

Update README.md

Files changed (1) hide show

README.md +35 -5

README.md CHANGED Viewed

@@ -24,17 +24,14 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -51,6 +48,39 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_steps: 15
 - num_epochs: 3
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |

 ## Model description
+Article: https://medium.com/@frankmorales_91352/fine-tuning-the-llm-mistral-7b-instruct-v0-3-249c1814ceaf
 ## Training and evaluation data
+Fine Tuning and Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Mistral_7B_Instruct_v0_1_for_text_to_SQL_EVALDATA.ipynb
 ### Training hyperparameters
 - lr_scheduler_warmup_steps: 15
 - num_epochs: 3
+from transformers import TrainingArguments
+args = TrainingArguments(
+    output_dir="Mistral-7B-text-to-sql-flash-attention-2-dataeval",    # directory to save and repository id
+    num_train_epochs=3,                     # number of training epochs
+    per_device_train_batch_size=3,          # batch size per device during training
+    gradient_accumulation_steps=8,      #2  # number of steps before performing a backward/update pass
+    gradient_checkpointing=True,            # use gradient checkpointing to save memory
+    optim="adamw_torch_fused",              # use fused adamw optimizer
+    logging_steps=10,                       # log every 10 steps
+    #save_strategy="epoch",                  # save checkpoint every epoch
+    learning_rate=2e-4,                     # learning rate, based on QLoRA paper
+    bf16=True,                              # use bfloat16 precision
+    tf32=True,                              # use tf32 precision
+    max_grad_norm=0.3,                      # max gradient norm based on QLoRA paper
+    warmup_ratio=0.03,                      # warmup ratio based on QLoRA paper
+    weight_decay=0.01,
+    lr_scheduler_type="constant",           # use constant learning rate scheduler
+    push_to_hub=True,                       # push model to hub
+    report_to="tensorboard",                # report metrics to tensorboard
+    hub_token=access_token_write,           # Add this line
+    load_best_model_at_end=True,
+    logging_dir="/content/gdrive/MyDrive/model/Mistral-7B-text-to-sql-flash-attention-2-dataeval/logs",
+    evaluation_strategy="steps",
+    eval_steps=10,
+    save_strategy="steps",
+    save_steps=10,
+    metric_for_best_model = "loss",
+    warmup_steps=15,
+)
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |