frankmorales2020 commited on
Commit
81230bf
1 Parent(s): 8ab2d00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -5
README.md CHANGED
@@ -24,17 +24,14 @@ It achieves the following results on the evaluation set:
24
 
25
  ## Model description
26
 
27
- More information needed
28
 
29
- ## Intended uses & limitations
30
 
31
- More information needed
32
 
33
  ## Training and evaluation data
34
 
35
- More information needed
36
 
37
- ## Training procedure
38
 
39
  ### Training hyperparameters
40
 
@@ -51,6 +48,39 @@ The following hyperparameters were used during training:
51
  - lr_scheduler_warmup_steps: 15
52
  - num_epochs: 3
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ### Training results
55
 
56
  | Training Loss | Epoch | Step | Validation Loss |
 
24
 
25
  ## Model description
26
 
27
+ Article: https://medium.com/@frankmorales_91352/fine-tuning-the-llm-mistral-7b-instruct-v0-3-249c1814ceaf
28
 
 
29
 
 
30
 
31
  ## Training and evaluation data
32
 
33
+ Fine Tuning and Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Mistral_7B_Instruct_v0_1_for_text_to_SQL_EVALDATA.ipynb
34
 
 
35
 
36
  ### Training hyperparameters
37
 
 
48
  - lr_scheduler_warmup_steps: 15
49
  - num_epochs: 3
50
 
51
+ from transformers import TrainingArguments
52
+
53
+ args = TrainingArguments(
54
+ output_dir="Mistral-7B-text-to-sql-flash-attention-2-dataeval", # directory to save and repository id
55
+
56
+ num_train_epochs=3, # number of training epochs
57
+ per_device_train_batch_size=3, # batch size per device during training
58
+ gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass
59
+ gradient_checkpointing=True, # use gradient checkpointing to save memory
60
+ optim="adamw_torch_fused", # use fused adamw optimizer
61
+ logging_steps=10, # log every 10 steps
62
+ #save_strategy="epoch", # save checkpoint every epoch
63
+ learning_rate=2e-4, # learning rate, based on QLoRA paper
64
+ bf16=True, # use bfloat16 precision
65
+ tf32=True, # use tf32 precision
66
+ max_grad_norm=0.3, # max gradient norm based on QLoRA paper
67
+ warmup_ratio=0.03, # warmup ratio based on QLoRA paper
68
+ weight_decay=0.01,
69
+ lr_scheduler_type="constant", # use constant learning rate scheduler
70
+ push_to_hub=True, # push model to hub
71
+ report_to="tensorboard", # report metrics to tensorboard
72
+ hub_token=access_token_write, # Add this line
73
+ load_best_model_at_end=True,
74
+ logging_dir="/content/gdrive/MyDrive/model/Mistral-7B-text-to-sql-flash-attention-2-dataeval/logs",
75
+
76
+ evaluation_strategy="steps",
77
+ eval_steps=10,
78
+ save_strategy="steps",
79
+ save_steps=10,
80
+ metric_for_best_model = "loss",
81
+ warmup_steps=15,
82
+ )
83
+
84
  ### Training results
85
 
86
  | Training Loss | Epoch | Step | Validation Loss |