llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the centime dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0123

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.0612 0.0449 5 0.0560
0.0411 0.0898 10 0.0357
0.0353 0.1347 15 0.0301
0.0286 0.1796 20 0.0264
0.0282 0.2245 25 0.0239
0.0223 0.2694 30 0.0224
0.0242 0.3143 35 0.0209
0.0211 0.3591 40 0.0203
0.0178 0.4040 45 0.0201
0.0206 0.4489 50 0.0196
0.0196 0.4938 55 0.0193
0.0173 0.5387 60 0.0193
0.0184 0.5836 65 0.0193
0.0194 0.6285 70 0.0191
0.0182 0.6734 75 0.0185
0.0169 0.7183 80 0.0183
0.0176 0.7632 85 0.0178
0.0158 0.8081 90 0.0176
0.02 0.8530 95 0.0172
0.0165 0.8979 100 0.0173
0.0181 0.9428 105 0.0168
0.0176 0.9877 110 0.0168
0.0184 1.0348 115 0.0183
0.0162 1.0797 120 0.0179
0.017 1.1246 125 0.0168
0.0143 1.1695 130 0.0167
0.0177 1.2144 135 0.0166
0.0138 1.2593 140 0.0161
0.0149 1.3042 145 0.0157
0.0162 1.3490 150 0.0160
0.0148 1.3939 155 0.0156
0.0168 1.4388 160 0.0154
0.0148 1.4837 165 0.0153
0.0146 1.5286 170 0.0154
0.0137 1.5735 175 0.0150
0.0144 1.6184 180 0.0150
0.0129 1.6633 185 0.0148
0.0139 1.7082 190 0.0145
0.013 1.7531 195 0.0145
0.013 1.7980 200 0.0144
0.0124 1.8429 205 0.0144
0.0135 1.8878 210 0.0143
0.0128 1.9327 215 0.0147
0.0149 1.9776 220 0.0143
0.0138 2.0247 225 0.0144
0.0127 2.0696 230 0.0143
0.0116 2.1145 235 0.0142
0.0128 2.1594 240 0.0143
0.0145 2.2043 245 0.0141
0.0147 2.2492 250 0.0139
0.0114 2.2941 255 0.0139
0.0114 2.3389 260 0.0139
0.0112 2.3838 265 0.0137
0.0105 2.4287 270 0.0138
0.0129 2.4736 275 0.0136
0.014 2.5185 280 0.0135
0.0124 2.5634 285 0.0136
0.0128 2.6083 290 0.0133
0.0106 2.6532 295 0.0129
0.0099 2.6981 300 0.0129
0.0111 2.7430 305 0.0129
0.0129 2.7879 310 0.0129
0.0088 2.8328 315 0.0129
0.0092 2.8777 320 0.0130
0.0086 2.9226 325 0.0129
0.0132 2.9675 330 0.0126
0.0126 3.0146 335 0.0130
0.0117 3.0595 340 0.0133
0.0102 3.1044 345 0.0132
0.0074 3.1493 350 0.0132
0.0105 3.1942 355 0.0129
0.0117 3.2391 360 0.0129
0.0107 3.2840 365 0.0127
0.0098 3.3288 370 0.0128
0.0092 3.3737 375 0.0127
0.0114 3.4186 380 0.0126
0.0118 3.4635 385 0.0125
0.0108 3.5084 390 0.0123
0.0092 3.5533 395 0.0123
0.0085 3.5982 400 0.0123
0.0088 3.6431 405 0.0126
0.0095 3.6880 410 0.0124
0.0072 3.7329 415 0.0124
0.0105 3.7778 420 0.0123
0.0115 3.8227 425 0.0122
0.007 3.8676 430 0.0121
0.0112 3.9125 435 0.0121
0.0103 3.9574 440 0.0121
0.0162 4.0045 445 0.0122
0.0079 4.0494 450 0.0125
0.0102 4.0943 455 0.0126
0.0087 4.1392 460 0.0126
0.0107 4.1841 465 0.0126
0.0105 4.2290 470 0.0125
0.0089 4.2738 475 0.0124
0.0061 4.3187 480 0.0125
0.0074 4.3636 485 0.0126
0.008 4.4085 490 0.0126
0.0092 4.4534 495 0.0125
0.0092 4.4983 500 0.0125
0.0061 4.5432 505 0.0124
0.0089 4.5881 510 0.0124
0.01 4.6330 515 0.0124
0.0081 4.6779 520 0.0124
0.0072 4.7228 525 0.0124
0.0078 4.7677 530 0.0124
0.009 4.8126 535 0.0124
0.0106 4.8575 540 0.0124
0.0079 4.9024 545 0.0124
0.0082 4.9473 550 0.0124
0.0082 4.9921 555 0.0124

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sizhkhy/centime

Adapter
(169)
this model