cls_alldata_llama3_v1

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4523

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 2
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.6921 0.0582 20 0.6831
0.5975 0.1164 40 0.6416
0.6107 0.1747 60 0.6082
0.5609 0.2329 80 0.5883
0.5857 0.2911 100 0.5761
0.5386 0.3493 120 0.5660
0.5176 0.4076 140 0.5529
0.5317 0.4658 160 0.5379
0.5244 0.5240 180 0.5292
0.5218 0.5822 200 0.5234
0.5003 0.6405 220 0.5207
0.5024 0.6987 240 0.5096
0.4913 0.7569 260 0.5062
0.5174 0.8151 280 0.5003
0.4675 0.8734 300 0.4968
0.5137 0.9316 320 0.4903
0.4883 0.9898 340 0.4869
0.3616 1.0480 360 0.4935
0.3713 1.1063 380 0.4890
0.365 1.1645 400 0.4856
0.3732 1.2227 420 0.4838
0.3717 1.2809 440 0.4842
0.3657 1.3392 460 0.4811
0.3767 1.3974 480 0.4762
0.3859 1.4556 500 0.4763
0.3773 1.5138 520 0.4712
0.3615 1.5721 540 0.4671
0.3656 1.6303 560 0.4666
0.3497 1.6885 580 0.4658
0.3818 1.7467 600 0.4621
0.3759 1.8049 620 0.4626
0.3539 1.8632 640 0.4551
0.3985 1.9214 660 0.4525
0.3668 1.9796 680 0.4523

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Sorour/cls_alldata_llama3_v1

Adapter
(665)
this model