collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1048
- Num Input Tokens Seen: 30937840
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5329 | 0.0088 | 5 | 1.3818 | 275656 |
1.5828 | 0.0177 | 10 | 1.3192 | 552624 |
1.428 | 0.0265 | 15 | 1.2548 | 826024 |
1.2605 | 0.0353 | 20 | 1.1988 | 1101872 |
1.128 | 0.0441 | 25 | 1.1751 | 1372272 |
1.1142 | 0.0530 | 30 | 1.1723 | 1649368 |
1.0356 | 0.0618 | 35 | 1.1729 | 1923640 |
0.8851 | 0.0706 | 40 | 1.1976 | 2191160 |
0.8191 | 0.0794 | 45 | 1.2022 | 2461680 |
0.7443 | 0.0883 | 50 | 1.2214 | 2734480 |
0.6972 | 0.0971 | 55 | 1.2002 | 3009552 |
0.6345 | 0.1059 | 60 | 1.1883 | 3280536 |
0.6249 | 0.1148 | 65 | 1.1948 | 3550800 |
0.5123 | 0.1236 | 70 | 1.1961 | 3821736 |
0.4661 | 0.1324 | 75 | 1.1938 | 4096488 |
0.491 | 0.1412 | 80 | 1.1864 | 4365312 |
0.4568 | 0.1501 | 85 | 1.1824 | 4633488 |
0.4108 | 0.1589 | 90 | 1.1803 | 4908632 |
0.4298 | 0.1677 | 95 | 1.1849 | 5184840 |
0.3863 | 0.1765 | 100 | 1.1807 | 5459872 |
0.469 | 0.1854 | 105 | 1.1808 | 5733384 |
0.4112 | 0.1942 | 110 | 1.1745 | 6008360 |
0.4328 | 0.2030 | 115 | 1.1731 | 6279928 |
0.4686 | 0.2119 | 120 | 1.1719 | 6549560 |
0.4254 | 0.2207 | 125 | 1.1650 | 6820064 |
0.4131 | 0.2295 | 130 | 1.1679 | 7092784 |
0.3801 | 0.2383 | 135 | 1.1606 | 7373688 |
0.3755 | 0.2472 | 140 | 1.1660 | 7644816 |
0.372 | 0.2560 | 145 | 1.1577 | 7921944 |
0.376 | 0.2648 | 150 | 1.1549 | 8196384 |
0.3551 | 0.2736 | 155 | 1.1556 | 8467640 |
0.3409 | 0.2825 | 160 | 1.1497 | 8746104 |
0.3663 | 0.2913 | 165 | 1.1528 | 9022632 |
0.3558 | 0.3001 | 170 | 1.1509 | 9298176 |
0.3088 | 0.3089 | 175 | 1.1524 | 9575872 |
0.3736 | 0.3178 | 180 | 1.1473 | 9841528 |
0.3377 | 0.3266 | 185 | 1.1481 | 10116520 |
0.3473 | 0.3354 | 190 | 1.1428 | 10394896 |
0.3137 | 0.3443 | 195 | 1.1425 | 10665848 |
0.3163 | 0.3531 | 200 | 1.1433 | 10939128 |
0.2973 | 0.3619 | 205 | 1.1410 | 11212464 |
0.3062 | 0.3707 | 210 | 1.1446 | 11480560 |
0.4125 | 0.3796 | 215 | 1.1397 | 11759744 |
0.3505 | 0.3884 | 220 | 1.1423 | 12033656 |
0.3489 | 0.3972 | 225 | 1.1403 | 12304952 |
0.265 | 0.4060 | 230 | 1.1346 | 12573520 |
0.2683 | 0.4149 | 235 | 1.1399 | 12844072 |
0.2863 | 0.4237 | 240 | 1.1370 | 13114088 |
0.2612 | 0.4325 | 245 | 1.1384 | 13391416 |
0.3089 | 0.4414 | 250 | 1.1348 | 13665888 |
0.2451 | 0.4502 | 255 | 1.1337 | 13934272 |
0.3628 | 0.4590 | 260 | 1.1334 | 14210656 |
0.3143 | 0.4678 | 265 | 1.1321 | 14481944 |
0.2468 | 0.4767 | 270 | 1.1317 | 14748960 |
0.3403 | 0.4855 | 275 | 1.1282 | 15025096 |
0.3069 | 0.4943 | 280 | 1.1276 | 15294856 |
0.3461 | 0.5031 | 285 | 1.1277 | 15568080 |
0.2733 | 0.5120 | 290 | 1.1283 | 15837368 |
0.3364 | 0.5208 | 295 | 1.1265 | 16106872 |
0.3107 | 0.5296 | 300 | 1.1228 | 16382760 |
0.2594 | 0.5385 | 305 | 1.1277 | 16651328 |
0.3674 | 0.5473 | 310 | 1.1237 | 16921656 |
0.2966 | 0.5561 | 315 | 1.1227 | 17201416 |
0.2795 | 0.5649 | 320 | 1.1247 | 17480400 |
0.3032 | 0.5738 | 325 | 1.1228 | 17754296 |
0.268 | 0.5826 | 330 | 1.1208 | 18024456 |
0.2329 | 0.5914 | 335 | 1.1225 | 18296232 |
0.293 | 0.6002 | 340 | 1.1196 | 18568008 |
0.2789 | 0.6091 | 345 | 1.1186 | 18842272 |
0.3291 | 0.6179 | 350 | 1.1215 | 19118304 |
0.3131 | 0.6267 | 355 | 1.1179 | 19396528 |
0.2905 | 0.6356 | 360 | 1.1180 | 19667944 |
0.3705 | 0.6444 | 365 | 1.1168 | 19942280 |
0.3211 | 0.6532 | 370 | 1.1155 | 20213920 |
0.3426 | 0.6620 | 375 | 1.1159 | 20488320 |
0.2674 | 0.6709 | 380 | 1.1158 | 20761120 |
0.2985 | 0.6797 | 385 | 1.1161 | 21039632 |
0.2743 | 0.6885 | 390 | 1.1135 | 21308888 |
0.2949 | 0.6973 | 395 | 1.1175 | 21583640 |
0.2632 | 0.7062 | 400 | 1.1148 | 21861936 |
0.3536 | 0.7150 | 405 | 1.1137 | 22141144 |
0.3069 | 0.7238 | 410 | 1.1147 | 22415856 |
0.2709 | 0.7326 | 415 | 1.1140 | 22687328 |
0.2526 | 0.7415 | 420 | 1.1131 | 22960104 |
0.2865 | 0.7503 | 425 | 1.1115 | 23234496 |
0.4072 | 0.7591 | 430 | 1.1117 | 23501504 |
0.3175 | 0.7680 | 435 | 1.1102 | 23773256 |
0.2798 | 0.7768 | 440 | 1.1101 | 24046768 |
0.3312 | 0.7856 | 445 | 1.1094 | 24323872 |
0.3448 | 0.7944 | 450 | 1.1098 | 24602456 |
0.2342 | 0.8033 | 455 | 1.1093 | 24873976 |
0.3352 | 0.8121 | 460 | 1.1091 | 25142944 |
0.2058 | 0.8209 | 465 | 1.1062 | 25422584 |
0.3473 | 0.8297 | 470 | 1.1066 | 25702288 |
0.3227 | 0.8386 | 475 | 1.1085 | 25972656 |
0.2548 | 0.8474 | 480 | 1.1072 | 26241896 |
0.2785 | 0.8562 | 485 | 1.1055 | 26513264 |
0.3941 | 0.8651 | 490 | 1.1053 | 26788024 |
0.2188 | 0.8739 | 495 | 1.1053 | 27060584 |
0.2283 | 0.8827 | 500 | 1.1057 | 27330952 |
0.316 | 0.8915 | 505 | 1.1054 | 27602456 |
0.2504 | 0.9004 | 510 | 1.1046 | 27873592 |
0.3032 | 0.9092 | 515 | 1.1029 | 28149760 |
0.3913 | 0.9180 | 520 | 1.1042 | 28429672 |
0.3072 | 0.9268 | 525 | 1.1044 | 28700704 |
0.2355 | 0.9357 | 530 | 1.1026 | 28972856 |
0.2685 | 0.9445 | 535 | 1.1023 | 29244952 |
0.2743 | 0.9533 | 540 | 1.1032 | 29521872 |
0.2402 | 0.9622 | 545 | 1.1006 | 29798312 |
0.263 | 0.9710 | 550 | 1.1012 | 30071680 |
0.3205 | 0.9798 | 555 | 1.1012 | 30341752 |
0.2768 | 0.9886 | 560 | 1.1006 | 30611208 |
0.3064 | 0.9975 | 565 | 1.1042 | 30884496 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2
Base model
google/gemma-2-2b