llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the relianceada-oneshot-train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0057

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
0.0536 0.0291 5 0.0522
0.0383 0.0582 10 0.0333
0.0217 0.0873 15 0.0211
0.0192 0.1164 20 0.0160
0.0144 0.1456 25 0.0133
0.0127 0.1747 30 0.0119
0.0113 0.2038 35 0.0109
0.0115 0.2329 40 0.0104
0.0105 0.2620 45 0.0098
0.0099 0.2911 50 0.0093
0.0102 0.3202 55 0.0090
0.0092 0.3493 60 0.0087
0.0093 0.3785 65 0.0085
0.0084 0.4076 70 0.0083
0.0088 0.4367 75 0.0083
0.0084 0.4658 80 0.0079
0.0086 0.4949 85 0.0079
0.0079 0.5240 90 0.0078
0.0083 0.5531 95 0.0077
0.0086 0.5822 100 0.0078
0.0089 0.6114 105 0.0076
0.0075 0.6405 110 0.0076
0.0078 0.6696 115 0.0075
0.0079 0.6987 120 0.0074
0.0078 0.7278 125 0.0073
0.008 0.7569 130 0.0072
0.0077 0.7860 135 0.0070
0.0079 0.8151 140 0.0070
0.0071 0.8443 145 0.0070
0.0072 0.8734 150 0.0071
0.0076 0.9025 155 0.0070
0.0075 0.9316 160 0.0070
0.0074 0.9607 165 0.0069
0.0073 0.9898 170 0.0069
0.007 1.0189 175 0.0069
0.0072 1.0480 180 0.0069
0.0068 1.0771 185 0.0068
0.0067 1.1063 190 0.0069
0.0075 1.1354 195 0.0068
0.0072 1.1645 200 0.0068
0.0075 1.1936 205 0.0068
0.0066 1.2227 210 0.0067
0.0068 1.2518 215 0.0068
0.007 1.2809 220 0.0069
0.0065 1.3100 225 0.0068
0.0063 1.3392 230 0.0068
0.0068 1.3683 235 0.0067
0.0066 1.3974 240 0.0067
0.0063 1.4265 245 0.0068
0.0069 1.4556 250 0.0068
0.0068 1.4847 255 0.0067
0.0067 1.5138 260 0.0067
0.0063 1.5429 265 0.0065
0.0066 1.5721 270 0.0067
0.0063 1.6012 275 0.0066
0.0064 1.6303 280 0.0066
0.0066 1.6594 285 0.0066
0.0068 1.6885 290 0.0065
0.0065 1.7176 295 0.0064
0.0064 1.7467 300 0.0064
0.0068 1.7758 305 0.0064
0.0063 1.8049 310 0.0064
0.0067 1.8341 315 0.0064
0.0065 1.8632 320 0.0065
0.006 1.8923 325 0.0064
0.0064 1.9214 330 0.0064
0.0065 1.9505 335 0.0064
0.0061 1.9796 340 0.0063
0.006 2.0087 345 0.0063
0.0058 2.0378 350 0.0063
0.0057 2.0670 355 0.0063
0.0059 2.0961 360 0.0062
0.0061 2.1252 365 0.0063
0.006 2.1543 370 0.0063
0.0062 2.1834 375 0.0063
0.0064 2.2125 380 0.0063
0.006 2.2416 385 0.0062
0.0062 2.2707 390 0.0061
0.0061 2.2999 395 0.0062
0.0063 2.3290 400 0.0063
0.006 2.3581 405 0.0062
0.006 2.3872 410 0.0063
0.0057 2.4163 415 0.0062
0.0063 2.4454 420 0.0063
0.0065 2.4745 425 0.0062
0.006 2.5036 430 0.0062
0.0059 2.5328 435 0.0062
0.0058 2.5619 440 0.0062
0.0061 2.5910 445 0.0061
0.0061 2.6201 450 0.0061
0.0059 2.6492 455 0.0062
0.0057 2.6783 460 0.0062
0.0059 2.7074 465 0.0061
0.0058 2.7365 470 0.0062
0.0057 2.7656 475 0.0061
0.0058 2.7948 480 0.0061
0.0057 2.8239 485 0.0060
0.0059 2.8530 490 0.0060
0.0058 2.8821 495 0.0061
0.0059 2.9112 500 0.0060
0.0058 2.9403 505 0.0060
0.0057 2.9694 510 0.0061
0.0066 2.9985 515 0.0061
0.0055 3.0277 520 0.0060
0.005 3.0568 525 0.0060
0.0055 3.0859 530 0.0060
0.0054 3.1150 535 0.0060
0.0055 3.1441 540 0.0060
0.0056 3.1732 545 0.0060
0.0057 3.2023 550 0.0060
0.0058 3.2314 555 0.0060
0.0052 3.2606 560 0.0060
0.0058 3.2897 565 0.0060
0.0051 3.3188 570 0.0058
0.0051 3.3479 575 0.0059
0.0053 3.3770 580 0.0059
0.0053 3.4061 585 0.0059
0.0055 3.4352 590 0.0058
0.0051 3.4643 595 0.0059
0.0051 3.4934 600 0.0059
0.0055 3.5226 605 0.0059
0.0055 3.5517 610 0.0058
0.0055 3.5808 615 0.0058
0.0051 3.6099 620 0.0058
0.0054 3.6390 625 0.0057
0.0053 3.6681 630 0.0057
0.0052 3.6972 635 0.0057
0.0052 3.7263 640 0.0057
0.0052 3.7555 645 0.0058
0.0049 3.7846 650 0.0057
0.0055 3.8137 655 0.0057
0.0052 3.8428 660 0.0057
0.005 3.8719 665 0.0057
0.005 3.9010 670 0.0057
0.0051 3.9301 675 0.0057
0.0054 3.9592 680 0.0057
0.0052 3.9884 685 0.0057
0.0046 4.0175 690 0.0057
0.0047 4.0466 695 0.0057
0.0044 4.0757 700 0.0057
0.0047 4.1048 705 0.0057
0.0046 4.1339 710 0.0057
0.0046 4.1630 715 0.0057
0.0048 4.1921 720 0.0057
0.0047 4.2213 725 0.0057
0.0048 4.2504 730 0.0057
0.0047 4.2795 735 0.0057
0.0047 4.3086 740 0.0057
0.0046 4.3377 745 0.0057
0.0047 4.3668 750 0.0057
0.005 4.3959 755 0.0057
0.0043 4.4250 760 0.0057
0.0047 4.4541 765 0.0057
0.0047 4.4833 770 0.0057
0.0046 4.5124 775 0.0057
0.0047 4.5415 780 0.0057
0.0047 4.5706 785 0.0057
0.0048 4.5997 790 0.0057
0.0045 4.6288 795 0.0057
0.0045 4.6579 800 0.0057
0.0049 4.6870 805 0.0057
0.0045 4.7162 810 0.0057
0.0045 4.7453 815 0.0057
0.0044 4.7744 820 0.0057
0.0045 4.8035 825 0.0057
0.0046 4.8326 830 0.0057
0.0044 4.8617 835 0.0057
0.0044 4.8908 840 0.0057
0.0048 4.9199 845 0.0057
0.0047 4.9491 850 0.0057
0.0044 4.9782 855 0.0057

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
1
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for neel-nanonets/relianceada

Adapter
(169)
this model