llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the spinny dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0079

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.0598 0.0578 5 0.0557
0.0388 0.1156 10 0.0346
0.0275 0.1734 15 0.0276
0.0218 0.2312 20 0.0228
0.0236 0.2890 25 0.0203
0.0182 0.3468 30 0.0179
0.019 0.4046 35 0.0162
0.017 0.4624 40 0.0147
0.0147 0.5202 45 0.0137
0.0118 0.5780 50 0.0132
0.0107 0.6358 55 0.0127
0.016 0.6936 60 0.0123
0.0144 0.7514 65 0.0116
0.0119 0.8092 70 0.0113
0.0111 0.8671 75 0.0109
0.012 0.9249 80 0.0107
0.0139 0.9827 85 0.0102
0.0085 1.0405 90 0.0104
0.01 1.0983 95 0.0102
0.009 1.1561 100 0.0099
0.0094 1.2139 105 0.0098
0.0069 1.2717 110 0.0099
0.0108 1.3295 115 0.0096
0.0066 1.3873 120 0.0095
0.0089 1.4451 125 0.0094
0.0084 1.5029 130 0.0093
0.0102 1.5607 135 0.0093
0.01 1.6185 140 0.0091
0.0098 1.6763 145 0.0088
0.0071 1.7341 150 0.0087
0.0094 1.7919 155 0.0086
0.008 1.8497 160 0.0086
0.01 1.9075 165 0.0085
0.0084 1.9653 170 0.0086
0.0058 2.0231 175 0.0087
0.0056 2.0809 180 0.0090
0.0077 2.1387 185 0.0086
0.0061 2.1965 190 0.0086
0.008 2.2543 195 0.0083
0.0058 2.3121 200 0.0083
0.0047 2.3699 205 0.0084
0.0066 2.4277 210 0.0084
0.0055 2.4855 215 0.0082
0.0056 2.5434 220 0.0083
0.005 2.6012 225 0.0082
0.0065 2.6590 230 0.0082
0.0061 2.7168 235 0.0081
0.0052 2.7746 240 0.0082
0.0053 2.8324 245 0.0081
0.0058 2.8902 250 0.0079
0.0052 2.9480 255 0.0078
0.0071 3.0058 260 0.0080
0.0051 3.0636 265 0.0082
0.0033 3.1214 270 0.0086
0.004 3.1792 275 0.0084
0.0032 3.2370 280 0.0082
0.0042 3.2948 285 0.0082
0.0035 3.3526 290 0.0082
0.0041 3.4104 295 0.0081
0.0048 3.4682 300 0.0080
0.0046 3.5260 305 0.0080
0.004 3.5838 310 0.0080
0.0032 3.6416 315 0.0081
0.0039 3.6994 320 0.0084
0.0042 3.7572 325 0.0083
0.0046 3.8150 330 0.0080
0.0035 3.8728 335 0.0081
0.0048 3.9306 340 0.0081
0.0056 3.9884 345 0.0080
0.0025 4.0462 350 0.0080
0.0035 4.1040 355 0.0082
0.0028 4.1618 360 0.0083
0.0028 4.2197 365 0.0084
0.003 4.2775 370 0.0085
0.0033 4.3353 375 0.0085
0.003 4.3931 380 0.0086
0.0022 4.4509 385 0.0086
0.0028 4.5087 390 0.0086
0.0028 4.5665 395 0.0085
0.0031 4.6243 400 0.0085
0.0038 4.6821 405 0.0084
0.0024 4.7399 410 0.0084
0.0024 4.7977 415 0.0084
0.0024 4.8555 420 0.0084
0.0026 4.9133 425 0.0084
0.0029 4.9711 430 0.0084

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sizhkhy/spinny

Adapter
(242)
this model