File size: 4,129 Bytes
fdaa5b5 9464de5 fdaa5b5 9464de5 fdaa5b5 9464de5 fdaa5b5 9b93d23 510c379 9b93d23 9464de5 fdaa5b5 510c379 fdaa5b5 9464de5 fdaa5b5 9464de5 a12a12b 9464de5 a12a12b 9464de5 a12a12b 9464de5 a12a12b 9464de5 a12a12b 9464de5 a12a12b 9464de5 fdaa5b5 35e9fd1 9464de5 35e9fd1 9464de5 35e9fd1 9464de5 fdaa5b5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
base_model: meta-llama/Meta-Llama-3-8B
datasets:
- generator
library_name: peft
license: llama3
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: NEW-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# NEW-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the generator dataset.
It achieves the following results on the evaluation set:
Accuracy (Eval dataset and predict) for a sample of 10: 70.00%
## Model description
Article: https://medium.com/@frankmorales_91352/fine-tuning-meta-llama-3-8b-with-medal-a-refined-approach-for-enhanced-medical-language-b924d226b09d
## Training and evaluation data
Article: https://medium.com/@frankmorales_91352/fine-tuning-meta-llama-3-8b-with-medal-a-refined-approach-for-enhanced-medical-language-b924d226b09d
Fine-Tuning: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Meta_Llama_3_8B_for_MEDAL_EVALDATA.ipynb
Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/Meta_Llama_3_8B_for_MEDAL_EVALUATOR_evaldata.ipynb
## Training procedure
from transformers import EarlyStoppingCallback
trainer.add_callback(EarlyStoppingCallback(early_stopping_patience=5))
trainer.train()
trainer.save_model()
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 1
from transformers import TrainingArguments
args = TrainingArguments(
output_dir="/content/gdrive/MyDrive/model/NEW-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata",
#num_train_epochs=3, # number of training epochs
num_train_epochs=1, # number of training epochs for POC
per_device_train_batch_size=2, # batch size per device during training
gradient_accumulation_steps=8, # number of steps before performing a backward/update pass
gradient_checkpointing=True, # use gradient checkpointing to save memory
optim="adamw_torch_fused", # use fused adamw optimizer
logging_steps=200, # log every 200 steps
learning_rate=2e-4, # learning rate, based on QLoRA paper # i used in the first model
bf16=True, # use bfloat16 precision
tf32=True, # use tf32 precision
max_grad_norm=1.0, # max gradient norm based on QLoRA paper
warmup_ratio=0.05, # warmup ratio based on QLoRA paper = 0.03
weight_decay=0.01,
lr_scheduler_type="cosine", # lr_scheduler_type="cosine" (Cosine Annealing Learning Rate)
push_to_hub=True, # push model to hub
report_to="tensorboard", # report metrics to tensorboard
gradient_checkpointing_kwargs={"use_reentrant": True},
load_best_model_at_end=True,
logging_dir="/content/gdrive/MyDrive/model/NEW-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata/logs",
evaluation_strategy="steps", # Evaluate at step intervals
eval_steps=200, # Evaluate every 50 steps
save_strategy="steps", # Save checkpoints at step intervals
save_steps=200, # Save every 50 steps (aligned with eval_steps)
metric_for_best_model = "loss",
]
)
### Training results
Step Training Loss Validation Loss
200 2.505300 2.382469
3600 2.226800 2.223289
### Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1 |