collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9485
  • Num Input Tokens Seen: 19153848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.3147 0.0133 5 1.1832 254068
1.0498 0.0265 10 1.0719 516552
0.9485 0.0398 15 1.0240 772872
0.702 0.0530 20 1.0216 1026756
0.5628 0.0663 25 1.0325 1281388
0.5407 0.0796 30 1.0239 1536964
0.368 0.0928 35 1.0172 1794688
0.3617 0.1061 40 1.0116 2049692
0.4112 0.1194 45 1.0076 2297888
0.3436 0.1326 50 1.0005 2555848
0.3919 0.1459 55 0.9977 2811876
0.3196 0.1591 60 0.9936 3062460
0.3427 0.1724 65 0.9888 3309344
0.3779 0.1857 70 0.9873 3558348
0.3086 0.1989 75 0.9835 3812356
0.3156 0.2122 80 0.9815 4067536
0.3243 0.2254 85 0.9801 4320128
0.353 0.2387 90 0.9767 4574252
0.315 0.2520 95 0.9759 4824900
0.2642 0.2652 100 0.9737 5082344
0.2696 0.2785 105 0.9740 5335240
0.3357 0.2918 110 0.9728 5593492
0.2995 0.3050 115 0.9700 5844120
0.2826 0.3183 120 0.9680 6100756
0.3509 0.3315 125 0.9657 6349476
0.2684 0.3448 130 0.9653 6605052
0.3148 0.3581 135 0.9654 6859396
0.2782 0.3713 140 0.9634 7111896
0.288 0.3846 145 0.9626 7365476
0.292 0.3978 150 0.9608 7623304
0.3483 0.4111 155 0.9601 7884416
0.2856 0.4244 160 0.9605 8142544
0.3717 0.4376 165 0.9597 8395160
0.258 0.4509 170 0.9583 8650004
0.3976 0.4642 175 0.9593 8900404
0.2696 0.4774 180 0.9585 9153460
0.3276 0.4907 185 0.9584 9401692
0.2893 0.5039 190 0.9571 9655896
0.3125 0.5172 195 0.9546 9917980
0.2823 0.5305 200 0.9554 10167660
0.3795 0.5437 205 0.9572 10416596
0.2921 0.5570 210 0.9555 10667880
0.2822 0.5702 215 0.9554 10922988
0.2741 0.5835 220 0.9562 11178148
0.2576 0.5968 225 0.9548 11431204
0.2774 0.6100 230 0.9549 11678968
0.3431 0.6233 235 0.9550 11938440
0.3598 0.6366 240 0.9552 12188988
0.321 0.6498 245 0.9537 12450232
0.3062 0.6631 250 0.9546 12702636
0.3252 0.6763 255 0.9538 12953464
0.2372 0.6896 260 0.9526 13213356
0.2677 0.7029 265 0.9538 13471308
0.2751 0.7161 270 0.9544 13719800
0.2369 0.7294 275 0.9540 13971304
0.2586 0.7426 280 0.9524 14218076
0.2612 0.7559 285 0.9531 14475828
0.3368 0.7692 290 0.9528 14736144
0.2279 0.7824 295 0.9513 14986776
0.2958 0.7957 300 0.9510 15241276
0.3639 0.8090 305 0.9527 15496724
0.3721 0.8222 310 0.9525 15751944
0.3751 0.8355 315 0.9516 16010436
0.3224 0.8487 320 0.9508 16264272
0.3244 0.8620 325 0.9492 16514404
0.2181 0.8753 330 0.9498 16771700
0.3845 0.8885 335 0.9521 17022500
0.2373 0.9018 340 0.9493 17281212
0.219 0.9150 345 0.9513 17531932
0.3645 0.9283 350 0.9508 17787156
0.3015 0.9416 355 0.9487 18042308
0.2971 0.9548 360 0.9504 18288852
0.2889 0.9681 365 0.9493 18546880
0.2924 0.9814 370 0.9493 18799684
0.2359 0.9946 375 0.9483 19052580

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
10
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd1

Base model

google/gemma-2-9b
Finetuned
(222)
this model