llama-3.2-3B-lora-r256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the gommt-oneshot-train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0062

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.0567 0.0372 5 0.0548
0.0421 0.0745 10 0.0434
0.0347 0.1117 15 0.0384
0.0306 0.1490 20 0.0344
0.0325 0.1862 25 0.0302
0.022 0.2235 30 0.0266
0.0251 0.2607 35 0.0241
0.0223 0.2980 40 0.0221
0.0174 0.3352 45 0.0208
0.0218 0.3724 50 0.0193
0.0208 0.4097 55 0.0189
0.0193 0.4469 60 0.0175
0.0178 0.4842 65 0.0167
0.017 0.5214 70 0.0159
0.0199 0.5587 75 0.0150
0.0185 0.5959 80 0.0150
0.0167 0.6331 85 0.0148
0.0159 0.6704 90 0.0143
0.0153 0.7076 95 0.0138
0.0144 0.7449 100 0.0136
0.0141 0.7821 105 0.0131
0.0156 0.8194 110 0.0129
0.0116 0.8566 115 0.0126
0.0154 0.8939 120 0.0123
0.0116 0.9311 125 0.0121
0.0167 0.9683 130 0.0118
0.0202 1.0056 135 0.0115
0.0126 1.0428 140 0.0114
0.0122 1.0801 145 0.0114
0.0126 1.1173 150 0.0114
0.0097 1.1546 155 0.0117
0.01 1.1918 160 0.0117
0.0112 1.2291 165 0.0111
0.0102 1.2663 170 0.0102
0.0114 1.3035 175 0.0096
0.0109 1.3408 180 0.0094
0.0119 1.3780 185 0.0096
0.0099 1.4153 190 0.0095
0.01 1.4525 195 0.0094
0.0117 1.4898 200 0.0093
0.0121 1.5270 205 0.0090
0.0104 1.5642 210 0.0088
0.0123 1.6015 215 0.0086
0.0092 1.6387 220 0.0084
0.012 1.6760 225 0.0086
0.0088 1.7132 230 0.0086
0.0098 1.7505 235 0.0080
0.01 1.7877 240 0.0083
0.0089 1.8250 245 0.0080
0.0094 1.8622 250 0.0082
0.0086 1.8994 255 0.0081
0.0092 1.9367 260 0.0080
0.0097 1.9739 265 0.0081
0.0074 2.0112 270 0.0079
0.0071 2.0484 275 0.0080
0.0087 2.0857 280 0.0079
0.0078 2.1229 285 0.0078
0.0071 2.1601 290 0.0078
0.0062 2.1974 295 0.0077
0.0072 2.2346 300 0.0078
0.0078 2.2719 305 0.0079
0.0071 2.3091 310 0.0079
0.0064 2.3464 315 0.0078
0.0075 2.3836 320 0.0077
0.0075 2.4209 325 0.0074
0.007 2.4581 330 0.0075
0.0067 2.4953 335 0.0074
0.0054 2.5326 340 0.0076
0.006 2.5698 345 0.0069
0.007 2.6071 350 0.0069
0.0058 2.6443 355 0.0069
0.0062 2.6816 360 0.0070
0.0075 2.7188 365 0.0070
0.0062 2.7561 370 0.0067
0.0064 2.7933 375 0.0067
0.0076 2.8305 380 0.0067
0.0062 2.8678 385 0.0067
0.0076 2.9050 390 0.0065
0.0064 2.9423 395 0.0064
0.006 2.9795 400 0.0065
0.0045 3.0168 405 0.0066
0.0043 3.0540 410 0.0067
0.0045 3.0912 415 0.0066
0.0038 3.1285 420 0.0067
0.0041 3.1657 425 0.0068
0.0042 3.2030 430 0.0067
0.0046 3.2402 435 0.0066
0.0047 3.2775 440 0.0066
0.0045 3.3147 445 0.0065
0.005 3.3520 450 0.0065
0.0049 3.3892 455 0.0067
0.0044 3.4264 460 0.0065
0.0054 3.4637 465 0.0064
0.0045 3.5009 470 0.0064
0.0037 3.5382 475 0.0064
0.0039 3.5754 480 0.0063
0.0044 3.6127 485 0.0063
0.0039 3.6499 490 0.0063
0.0045 3.6872 495 0.0064
0.0042 3.7244 500 0.0064
0.0044 3.7616 505 0.0063
0.0045 3.7989 510 0.0063
0.0041 3.8361 515 0.0063
0.0042 3.8734 520 0.0063
0.004 3.9106 525 0.0064
0.0042 3.9479 530 0.0064
0.0043 3.9851 535 0.0062
0.003 4.0223 540 0.0062
0.003 4.0596 545 0.0064
0.0038 4.0968 550 0.0064
0.0032 4.1341 555 0.0063
0.003 4.1713 560 0.0062
0.0025 4.2086 565 0.0063
0.0025 4.2458 570 0.0062
0.0029 4.2831 575 0.0063
0.0027 4.3203 580 0.0062
0.0029 4.3575 585 0.0063
0.0029 4.3948 590 0.0063
0.0029 4.4320 595 0.0063
0.0028 4.4693 600 0.0062
0.0035 4.5065 605 0.0062
0.0024 4.5438 610 0.0062
0.0026 4.5810 615 0.0062
0.0028 4.6182 620 0.0062
0.0024 4.6555 625 0.0062
0.0031 4.6927 630 0.0061
0.0028 4.7300 635 0.0062
0.0025 4.7672 640 0.0062
0.003 4.8045 645 0.0062
0.0027 4.8417 650 0.0062
0.0027 4.8790 655 0.0062
0.0028 4.9162 660 0.0062
0.0029 4.9534 665 0.0062
0.0029 4.9907 670 0.0062

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
1
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sizhkhy/gommt

Adapter
(252)
this model