llama_3b_step2_batch_v1

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5060

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 40
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
1.0531 0.0170 50 1.2007
1.0336 0.0341 100 1.1242
0.9428 0.0511 150 1.0800
1.4386 0.0682 200 1.0408
0.8375 0.0852 250 1.0127
0.9193 0.1023 300 0.9817
1.0368 0.1193 350 0.9573
1.2018 0.1364 400 0.9319
1.2749 0.1534 450 0.9072
0.9881 0.1704 500 0.8820
0.9707 0.1875 550 0.8599
1.2377 0.2045 600 0.8412
0.9024 0.2216 650 0.8180
0.5889 0.2386 700 0.8024
0.8046 0.2557 750 0.7899
0.83 0.2727 800 0.7710
0.6852 0.2898 850 0.7548
0.8512 0.3068 900 0.7422
0.8377 0.3238 950 0.7345
0.5361 0.3409 1000 0.7220
0.7696 0.3579 1050 0.7105
0.8175 0.3750 1100 0.7013
0.6144 0.3920 1150 0.6886
0.3598 0.4091 1200 0.6809
0.7176 0.4261 1250 0.6692
0.5281 0.4432 1300 0.6644
0.3555 0.4602 1350 0.6547
0.9024 0.4772 1400 0.6471
0.7713 0.4943 1450 0.6386
0.6172 0.5113 1500 0.6322
0.6325 0.5284 1550 0.6266
0.7503 0.5454 1600 0.6206
0.349 0.5625 1650 0.6136
0.7 0.5795 1700 0.6085
0.5014 0.5966 1750 0.6023
0.6441 0.6136 1800 0.5975
0.5066 0.6306 1850 0.5921
0.6036 0.6477 1900 0.5883
0.6549 0.6647 1950 0.5840
0.3903 0.6818 2000 0.5789
0.8864 0.6988 2050 0.5754
0.7164 0.7159 2100 0.5709
0.5504 0.7329 2150 0.5687
0.4216 0.7500 2200 0.5646
0.4241 0.7670 2250 0.5618
0.6452 0.7840 2300 0.5590
0.7067 0.8011 2350 0.5558
0.4536 0.8181 2400 0.5537
0.8657 0.8352 2450 0.5508
0.7452 0.8522 2500 0.5483
0.3444 0.8693 2550 0.5458
0.2889 0.8863 2600 0.5437
0.2415 0.9034 2650 0.5401
0.5393 0.9204 2700 0.5385
0.4866 0.9374 2750 0.5372
0.9233 0.9545 2800 0.5347
0.4623 0.9715 2850 0.5318
0.4211 0.9886 2900 0.5299
0.4308 1.0056 2950 0.5283
0.618 1.0227 3000 0.5285
0.7693 1.0397 3050 0.5262
0.2893 1.0568 3100 0.5266
0.461 1.0738 3150 0.5273
0.3648 1.0908 3200 0.5230
0.4981 1.1079 3250 0.5253
0.5005 1.1249 3300 0.5222
0.4117 1.1420 3350 0.5217
0.3319 1.1590 3400 0.5188
0.2549 1.1761 3450 0.5190
0.3758 1.1931 3500 0.5186
0.2889 1.2102 3550 0.5173
0.6341 1.2272 3600 0.5167
0.3217 1.2442 3650 0.5155
0.4406 1.2613 3700 0.5150
0.7445 1.2783 3750 0.5148
0.5511 1.2954 3800 0.5133
0.3933 1.3124 3850 0.5125
0.39 1.3295 3900 0.5134
0.3015 1.3465 3950 0.5126
0.8124 1.3636 4000 0.5118
0.6512 1.3806 4050 0.5111
0.7011 1.3976 4100 0.5106
0.4556 1.4147 4150 0.5103
0.4563 1.4317 4200 0.5100
0.2651 1.4488 4250 0.5100
0.5674 1.4658 4300 0.5090
0.2869 1.4829 4350 0.5093
0.5327 1.4999 4400 0.5088
0.726 1.5170 4450 0.5086
0.2619 1.5340 4500 0.5084
0.6597 1.5510 4550 0.5081
0.4848 1.5681 4600 0.5083
0.412 1.5851 4650 0.5080
0.6712 1.6022 4700 0.5077
0.5523 1.6192 4750 0.5076
0.5105 1.6363 4800 0.5077
0.5315 1.6533 4850 0.5071
0.4166 1.6704 4900 0.5069
0.4081 1.6874 4950 0.5065
0.3154 1.7044 5000 0.5063
0.396 1.7215 5050 0.5063
0.6121 1.7385 5100 0.5064
0.379 1.7556 5150 0.5063
0.4534 1.7726 5200 0.5061
0.5572 1.7897 5250 0.5060
0.3847 1.8067 5300 0.5059
0.3751 1.8238 5350 0.5060
0.4346 1.8408 5400 0.5061
0.4928 1.8578 5450 0.5061
0.5215 1.8749 5500 0.5060
0.6156 1.8919 5550 0.5060
0.4041 1.9090 5600 0.5060
0.5604 1.9260 5650 0.5059
0.424 1.9431 5700 0.5060
0.1856 1.9601 5750 0.5060
0.3701 1.9772 5800 0.5061
0.4201 1.9942 5850 0.5060

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.1.0+cu118
  • Datasets 3.0.2
  • Tokenizers 0.20.1
Downloads last month
7
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.