collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0917
  • Num Input Tokens Seen: 20879032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5067 0.0133 5 1.3748 273504
1.4081 0.0265 10 1.2830 545936
1.3549 0.0398 15 1.2063 825832
1.1392 0.0531 20 1.1670 1098248
1.0831 0.0663 25 1.1608 1374984
1.0367 0.0796 30 1.1570 1651888
0.8575 0.0929 35 1.1675 1925768
0.8087 0.1061 40 1.1749 2202208
0.8424 0.1194 45 1.1664 2483168
0.7382 0.1326 50 1.1891 2759744
0.6546 0.1459 55 1.1755 3041192
0.6009 0.1592 60 1.1720 3312936
0.65 0.1724 65 1.1656 3584968
0.513 0.1857 70 1.1583 3859624
0.5095 0.1990 75 1.1543 4142640
0.3864 0.2122 80 1.1539 4420432
0.5413 0.2255 85 1.1485 4695096
0.4637 0.2388 90 1.1475 4975288
0.5035 0.2520 95 1.1442 5257360
0.5414 0.2653 100 1.1402 5532816
0.5575 0.2786 105 1.1374 5814560
0.4326 0.2918 110 1.1351 6089288
0.3607 0.3051 115 1.1324 6365672
0.4166 0.3184 120 1.1335 6646232
0.4479 0.3316 125 1.1293 6920344
0.5133 0.3449 130 1.1298 7203792
0.3867 0.3581 135 1.1239 7492288
0.4439 0.3714 140 1.1289 7773672
0.4353 0.3847 145 1.1216 8058928
0.4172 0.3979 150 1.1244 8336984
0.3993 0.4112 155 1.1174 8618288
0.4248 0.4245 160 1.1228 8897568
0.4136 0.4377 165 1.1207 9173528
0.4214 0.4510 170 1.1164 9448368
0.4742 0.4643 175 1.1196 9724824
0.3857 0.4775 180 1.1164 10003152
0.3429 0.4908 185 1.1163 10283192
0.4161 0.5041 190 1.1166 10566080
0.4795 0.5173 195 1.1120 10841592
0.3835 0.5306 200 1.1116 11127760
0.3267 0.5439 205 1.1113 11398928
0.4626 0.5571 210 1.1093 11678328
0.3755 0.5704 215 1.1088 11958848
0.3646 0.5837 220 1.1075 12231888
0.4435 0.5969 225 1.1079 12502048
0.4098 0.6102 230 1.1070 12781520
0.3391 0.6234 235 1.1051 13060160
0.3454 0.6367 240 1.1053 13332944
0.4199 0.6500 245 1.1058 13607448
0.462 0.6632 250 1.1019 13886208
0.3375 0.6765 255 1.1056 14167776
0.3267 0.6898 260 1.1020 14443704
0.3554 0.7030 265 1.1009 14719960
0.3085 0.7163 270 1.1033 14994048
0.4255 0.7296 275 1.0997 15265616
0.4229 0.7428 280 1.1005 15537584
0.4453 0.7561 285 1.1007 15815648
0.2962 0.7694 290 1.0979 16092416
0.3443 0.7826 295 1.0984 16373224
0.3969 0.7959 300 1.0989 16651856
0.3985 0.8092 305 1.0961 16930456
0.3441 0.8224 310 1.0960 17210416
0.4218 0.8357 315 1.0953 17492744
0.3387 0.8489 320 1.0962 17770488
0.4258 0.8622 325 1.0944 18041344
0.3612 0.8755 330 1.0946 18322400
0.3957 0.8887 335 1.0933 18602800
0.414 0.9020 340 1.0927 18878536
0.4075 0.9153 345 1.0920 19160392
0.3823 0.9285 350 1.0920 19441032
0.3939 0.9418 355 1.0913 19714912
0.3477 0.9551 360 1.0919 19991552
0.3052 0.9683 365 1.0912 20265240
0.3146 0.9816 370 1.0887 20542976
0.3115 0.9949 375 1.0915 20823576

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

Base model

google/gemma-2-2b
Finetuned
(512)
this model