collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1048
  • Num Input Tokens Seen: 30937840

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5329 0.0088 5 1.3818 275656
1.5828 0.0177 10 1.3192 552624
1.428 0.0265 15 1.2548 826024
1.2605 0.0353 20 1.1988 1101872
1.128 0.0441 25 1.1751 1372272
1.1142 0.0530 30 1.1723 1649368
1.0356 0.0618 35 1.1729 1923640
0.8851 0.0706 40 1.1976 2191160
0.8191 0.0794 45 1.2022 2461680
0.7443 0.0883 50 1.2214 2734480
0.6972 0.0971 55 1.2002 3009552
0.6345 0.1059 60 1.1883 3280536
0.6249 0.1148 65 1.1948 3550800
0.5123 0.1236 70 1.1961 3821736
0.4661 0.1324 75 1.1938 4096488
0.491 0.1412 80 1.1864 4365312
0.4568 0.1501 85 1.1824 4633488
0.4108 0.1589 90 1.1803 4908632
0.4298 0.1677 95 1.1849 5184840
0.3863 0.1765 100 1.1807 5459872
0.469 0.1854 105 1.1808 5733384
0.4112 0.1942 110 1.1745 6008360
0.4328 0.2030 115 1.1731 6279928
0.4686 0.2119 120 1.1719 6549560
0.4254 0.2207 125 1.1650 6820064
0.4131 0.2295 130 1.1679 7092784
0.3801 0.2383 135 1.1606 7373688
0.3755 0.2472 140 1.1660 7644816
0.372 0.2560 145 1.1577 7921944
0.376 0.2648 150 1.1549 8196384
0.3551 0.2736 155 1.1556 8467640
0.3409 0.2825 160 1.1497 8746104
0.3663 0.2913 165 1.1528 9022632
0.3558 0.3001 170 1.1509 9298176
0.3088 0.3089 175 1.1524 9575872
0.3736 0.3178 180 1.1473 9841528
0.3377 0.3266 185 1.1481 10116520
0.3473 0.3354 190 1.1428 10394896
0.3137 0.3443 195 1.1425 10665848
0.3163 0.3531 200 1.1433 10939128
0.2973 0.3619 205 1.1410 11212464
0.3062 0.3707 210 1.1446 11480560
0.4125 0.3796 215 1.1397 11759744
0.3505 0.3884 220 1.1423 12033656
0.3489 0.3972 225 1.1403 12304952
0.265 0.4060 230 1.1346 12573520
0.2683 0.4149 235 1.1399 12844072
0.2863 0.4237 240 1.1370 13114088
0.2612 0.4325 245 1.1384 13391416
0.3089 0.4414 250 1.1348 13665888
0.2451 0.4502 255 1.1337 13934272
0.3628 0.4590 260 1.1334 14210656
0.3143 0.4678 265 1.1321 14481944
0.2468 0.4767 270 1.1317 14748960
0.3403 0.4855 275 1.1282 15025096
0.3069 0.4943 280 1.1276 15294856
0.3461 0.5031 285 1.1277 15568080
0.2733 0.5120 290 1.1283 15837368
0.3364 0.5208 295 1.1265 16106872
0.3107 0.5296 300 1.1228 16382760
0.2594 0.5385 305 1.1277 16651328
0.3674 0.5473 310 1.1237 16921656
0.2966 0.5561 315 1.1227 17201416
0.2795 0.5649 320 1.1247 17480400
0.3032 0.5738 325 1.1228 17754296
0.268 0.5826 330 1.1208 18024456
0.2329 0.5914 335 1.1225 18296232
0.293 0.6002 340 1.1196 18568008
0.2789 0.6091 345 1.1186 18842272
0.3291 0.6179 350 1.1215 19118304
0.3131 0.6267 355 1.1179 19396528
0.2905 0.6356 360 1.1180 19667944
0.3705 0.6444 365 1.1168 19942280
0.3211 0.6532 370 1.1155 20213920
0.3426 0.6620 375 1.1159 20488320
0.2674 0.6709 380 1.1158 20761120
0.2985 0.6797 385 1.1161 21039632
0.2743 0.6885 390 1.1135 21308888
0.2949 0.6973 395 1.1175 21583640
0.2632 0.7062 400 1.1148 21861936
0.3536 0.7150 405 1.1137 22141144
0.3069 0.7238 410 1.1147 22415856
0.2709 0.7326 415 1.1140 22687328
0.2526 0.7415 420 1.1131 22960104
0.2865 0.7503 425 1.1115 23234496
0.4072 0.7591 430 1.1117 23501504
0.3175 0.7680 435 1.1102 23773256
0.2798 0.7768 440 1.1101 24046768
0.3312 0.7856 445 1.1094 24323872
0.3448 0.7944 450 1.1098 24602456
0.2342 0.8033 455 1.1093 24873976
0.3352 0.8121 460 1.1091 25142944
0.2058 0.8209 465 1.1062 25422584
0.3473 0.8297 470 1.1066 25702288
0.3227 0.8386 475 1.1085 25972656
0.2548 0.8474 480 1.1072 26241896
0.2785 0.8562 485 1.1055 26513264
0.3941 0.8651 490 1.1053 26788024
0.2188 0.8739 495 1.1053 27060584
0.2283 0.8827 500 1.1057 27330952
0.316 0.8915 505 1.1054 27602456
0.2504 0.9004 510 1.1046 27873592
0.3032 0.9092 515 1.1029 28149760
0.3913 0.9180 520 1.1042 28429672
0.3072 0.9268 525 1.1044 28700704
0.2355 0.9357 530 1.1026 28972856
0.2685 0.9445 535 1.1023 29244952
0.2743 0.9533 540 1.1032 29521872
0.2402 0.9622 545 1.1006 29798312
0.263 0.9710 550 1.1012 30071680
0.3205 0.9798 555 1.1012 30341752
0.2768 0.9886 560 1.1006 30611208
0.3064 0.9975 565 1.1042 30884496

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2

Base model

google/gemma-2-2b
Finetuned
(512)
this model