collapse_gemma-2-2b_hs2_accumulate_iter12_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0961
- Num Input Tokens Seen: 62849584
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5935 | 0.0044 | 5 | 1.3888 | 271720 |
1.7331 | 0.0087 | 10 | 1.3715 | 550016 |
1.6855 | 0.0131 | 15 | 1.3340 | 822688 |
1.4762 | 0.0175 | 20 | 1.2787 | 1099080 |
1.4865 | 0.0218 | 25 | 1.2415 | 1370256 |
1.322 | 0.0262 | 30 | 1.2101 | 1640888 |
1.1677 | 0.0306 | 35 | 1.1925 | 1917336 |
1.1431 | 0.0349 | 40 | 1.2149 | 2193704 |
0.936 | 0.0393 | 45 | 1.2195 | 2475144 |
0.8429 | 0.0437 | 50 | 1.2590 | 2756336 |
0.6814 | 0.0480 | 55 | 1.2629 | 3035416 |
0.5116 | 0.0524 | 60 | 1.3132 | 3308832 |
0.3906 | 0.0568 | 65 | 1.2951 | 3577088 |
0.4331 | 0.0611 | 70 | 1.2602 | 3848632 |
0.3917 | 0.0655 | 75 | 1.2526 | 4118048 |
0.3345 | 0.0699 | 80 | 1.2279 | 4386472 |
0.2911 | 0.0742 | 85 | 1.2170 | 4659192 |
0.2479 | 0.0786 | 90 | 1.2142 | 4934896 |
0.1603 | 0.0830 | 95 | 1.2066 | 5203120 |
0.216 | 0.0873 | 100 | 1.1965 | 5475672 |
0.1852 | 0.0917 | 105 | 1.1977 | 5748152 |
0.2187 | 0.0961 | 110 | 1.1885 | 6025512 |
0.3007 | 0.1004 | 115 | 1.1899 | 6308600 |
0.1872 | 0.1048 | 120 | 1.1869 | 6582704 |
0.188 | 0.1092 | 125 | 1.1849 | 6856744 |
0.272 | 0.1135 | 130 | 1.1744 | 7138312 |
0.2329 | 0.1179 | 135 | 1.1824 | 7413352 |
0.2307 | 0.1223 | 140 | 1.1766 | 7690248 |
0.1938 | 0.1266 | 145 | 1.1730 | 7968448 |
0.2175 | 0.1310 | 150 | 1.1758 | 8245344 |
0.2381 | 0.1354 | 155 | 1.1690 | 8523112 |
0.1299 | 0.1397 | 160 | 1.1697 | 8805744 |
0.2116 | 0.1441 | 165 | 1.1601 | 9081792 |
0.1385 | 0.1485 | 170 | 1.1636 | 9356808 |
0.2319 | 0.1528 | 175 | 1.1644 | 9629816 |
0.171 | 0.1572 | 180 | 1.1565 | 9918272 |
0.2012 | 0.1616 | 185 | 1.1609 | 10193336 |
0.24 | 0.1659 | 190 | 1.1584 | 10472144 |
0.1798 | 0.1703 | 195 | 1.1575 | 10737552 |
0.2302 | 0.1747 | 200 | 1.1562 | 11012904 |
0.1482 | 0.1790 | 205 | 1.1571 | 11294024 |
0.1398 | 0.1834 | 210 | 1.1567 | 11572032 |
0.2569 | 0.1878 | 215 | 1.1559 | 11847680 |
0.2052 | 0.1921 | 220 | 1.1525 | 12114904 |
0.116 | 0.1965 | 225 | 1.1524 | 12390200 |
0.1802 | 0.2009 | 230 | 1.1535 | 12666000 |
0.157 | 0.2052 | 235 | 1.1510 | 12940280 |
0.1798 | 0.2096 | 240 | 1.1529 | 13214720 |
0.1697 | 0.2140 | 245 | 1.1490 | 13488056 |
0.1401 | 0.2183 | 250 | 1.1527 | 13761760 |
0.1112 | 0.2227 | 255 | 1.1511 | 14033400 |
0.1377 | 0.2270 | 260 | 1.1506 | 14311080 |
0.201 | 0.2314 | 265 | 1.1500 | 14585624 |
0.2307 | 0.2358 | 270 | 1.1478 | 14860104 |
0.2215 | 0.2401 | 275 | 1.1482 | 15136784 |
0.2379 | 0.2445 | 280 | 1.1437 | 15408480 |
0.1741 | 0.2489 | 285 | 1.1432 | 15677760 |
0.1597 | 0.2532 | 290 | 1.1441 | 15948136 |
0.2122 | 0.2576 | 295 | 1.1446 | 16232512 |
0.212 | 0.2620 | 300 | 1.1407 | 16506600 |
0.2287 | 0.2663 | 305 | 1.1417 | 16774512 |
0.2402 | 0.2707 | 310 | 1.1360 | 17051056 |
0.1399 | 0.2751 | 315 | 1.1362 | 17321400 |
0.2499 | 0.2794 | 320 | 1.1383 | 17596504 |
0.2073 | 0.2838 | 325 | 1.1400 | 17871840 |
0.1783 | 0.2882 | 330 | 1.1353 | 18146032 |
0.1699 | 0.2925 | 335 | 1.1347 | 18414688 |
0.2843 | 0.2969 | 340 | 1.1333 | 18690864 |
0.1551 | 0.3013 | 345 | 1.1365 | 18954472 |
0.2126 | 0.3056 | 350 | 1.1363 | 19226776 |
0.2148 | 0.3100 | 355 | 1.1354 | 19507072 |
0.1507 | 0.3144 | 360 | 1.1338 | 19776992 |
0.1366 | 0.3187 | 365 | 1.1355 | 20057112 |
0.1917 | 0.3231 | 370 | 1.1337 | 20338336 |
0.1653 | 0.3275 | 375 | 1.1342 | 20615656 |
0.1371 | 0.3318 | 380 | 1.1312 | 20887456 |
0.1944 | 0.3362 | 385 | 1.1297 | 21164312 |
0.1273 | 0.3406 | 390 | 1.1294 | 21439952 |
0.1793 | 0.3449 | 395 | 1.1298 | 21712112 |
0.1928 | 0.3493 | 400 | 1.1286 | 21987808 |
0.1705 | 0.3537 | 405 | 1.1285 | 22260096 |
0.1545 | 0.3580 | 410 | 1.1279 | 22539240 |
0.1836 | 0.3624 | 415 | 1.1299 | 22812528 |
0.2266 | 0.3668 | 420 | 1.1254 | 23087856 |
0.2331 | 0.3711 | 425 | 1.1256 | 23361464 |
0.1839 | 0.3755 | 430 | 1.1256 | 23642200 |
0.1458 | 0.3799 | 435 | 1.1250 | 23912416 |
0.1316 | 0.3842 | 440 | 1.1261 | 24189640 |
0.1576 | 0.3886 | 445 | 1.1254 | 24466896 |
0.1636 | 0.3930 | 450 | 1.1245 | 24742272 |
0.1343 | 0.3973 | 455 | 1.1247 | 25014328 |
0.2099 | 0.4017 | 460 | 1.1257 | 25289184 |
0.1554 | 0.4061 | 465 | 1.1268 | 25558392 |
0.1986 | 0.4104 | 470 | 1.1270 | 25833512 |
0.1433 | 0.4148 | 475 | 1.1223 | 26107904 |
0.1878 | 0.4192 | 480 | 1.1247 | 26377800 |
0.1742 | 0.4235 | 485 | 1.1232 | 26659912 |
0.1783 | 0.4279 | 490 | 1.1208 | 26931560 |
0.2104 | 0.4323 | 495 | 1.1224 | 27209440 |
0.1218 | 0.4366 | 500 | 1.1213 | 27481104 |
0.1102 | 0.4410 | 505 | 1.1203 | 27754376 |
0.1473 | 0.4454 | 510 | 1.1196 | 28023424 |
0.149 | 0.4497 | 515 | 1.1203 | 28296872 |
0.1801 | 0.4541 | 520 | 1.1201 | 28577152 |
0.203 | 0.4585 | 525 | 1.1184 | 28850152 |
0.1636 | 0.4628 | 530 | 1.1170 | 29124952 |
0.1674 | 0.4672 | 535 | 1.1194 | 29400360 |
0.1811 | 0.4716 | 540 | 1.1182 | 29684280 |
0.2102 | 0.4759 | 545 | 1.1194 | 29952920 |
0.1514 | 0.4803 | 550 | 1.1166 | 30221512 |
0.1332 | 0.4847 | 555 | 1.1198 | 30499776 |
0.1623 | 0.4890 | 560 | 1.1194 | 30776440 |
0.1994 | 0.4934 | 565 | 1.1153 | 31045504 |
0.2285 | 0.4978 | 570 | 1.1161 | 31325064 |
0.1682 | 0.5021 | 575 | 1.1176 | 31598984 |
0.1332 | 0.5065 | 580 | 1.1144 | 31868440 |
0.135 | 0.5109 | 585 | 1.1144 | 32139552 |
0.1732 | 0.5152 | 590 | 1.1167 | 32411424 |
0.2228 | 0.5196 | 595 | 1.1159 | 32681448 |
0.234 | 0.5240 | 600 | 1.1138 | 32962440 |
0.1171 | 0.5283 | 605 | 1.1155 | 33231880 |
0.1293 | 0.5327 | 610 | 1.1145 | 33506032 |
0.2405 | 0.5371 | 615 | 1.1142 | 33776520 |
0.1389 | 0.5414 | 620 | 1.1140 | 34045496 |
0.2331 | 0.5458 | 625 | 1.1157 | 34318928 |
0.1769 | 0.5502 | 630 | 1.1139 | 34586240 |
0.1866 | 0.5545 | 635 | 1.1117 | 34856384 |
0.1404 | 0.5589 | 640 | 1.1154 | 35127864 |
0.1704 | 0.5633 | 645 | 1.1156 | 35399776 |
0.1404 | 0.5676 | 650 | 1.1126 | 35680856 |
0.1405 | 0.5720 | 655 | 1.1127 | 35957696 |
0.1463 | 0.5764 | 660 | 1.1129 | 36228672 |
0.1566 | 0.5807 | 665 | 1.1111 | 36509360 |
0.1923 | 0.5851 | 670 | 1.1119 | 36787464 |
0.1145 | 0.5895 | 675 | 1.1122 | 37058632 |
0.1703 | 0.5938 | 680 | 1.1116 | 37327656 |
0.139 | 0.5982 | 685 | 1.1105 | 37604024 |
0.1782 | 0.6026 | 690 | 1.1091 | 37885328 |
0.1668 | 0.6069 | 695 | 1.1096 | 38165216 |
0.1543 | 0.6113 | 700 | 1.1116 | 38437936 |
0.1282 | 0.6157 | 705 | 1.1109 | 38717672 |
0.1251 | 0.6200 | 710 | 1.1086 | 38994864 |
0.1721 | 0.6244 | 715 | 1.1076 | 39267376 |
0.1353 | 0.6288 | 720 | 1.1089 | 39550464 |
0.1786 | 0.6331 | 725 | 1.1083 | 39817376 |
0.1542 | 0.6375 | 730 | 1.1066 | 40088808 |
0.1477 | 0.6419 | 735 | 1.1072 | 40368304 |
0.1708 | 0.6462 | 740 | 1.1079 | 40634856 |
0.2069 | 0.6506 | 745 | 1.1066 | 40905752 |
0.2316 | 0.6550 | 750 | 1.1054 | 41186728 |
0.2252 | 0.6593 | 755 | 1.1051 | 41459000 |
0.1218 | 0.6637 | 760 | 1.1059 | 41728192 |
0.2073 | 0.6680 | 765 | 1.1073 | 42004560 |
0.1235 | 0.6724 | 770 | 1.1057 | 42269264 |
0.2765 | 0.6768 | 775 | 1.1058 | 42546136 |
0.1358 | 0.6811 | 780 | 1.1048 | 42821696 |
0.1769 | 0.6855 | 785 | 1.1065 | 43092296 |
0.1571 | 0.6899 | 790 | 1.1059 | 43359992 |
0.1193 | 0.6942 | 795 | 1.1049 | 43637672 |
0.0992 | 0.6986 | 800 | 1.1070 | 43915440 |
0.1337 | 0.7030 | 805 | 1.1072 | 44183016 |
0.1601 | 0.7073 | 810 | 1.1043 | 44456280 |
0.1707 | 0.7117 | 815 | 1.1037 | 44731904 |
0.183 | 0.7161 | 820 | 1.1049 | 45010704 |
0.1938 | 0.7204 | 825 | 1.1032 | 45292264 |
0.1256 | 0.7248 | 830 | 1.1038 | 45557576 |
0.1858 | 0.7292 | 835 | 1.1042 | 45832528 |
0.1172 | 0.7335 | 840 | 1.1039 | 46106904 |
0.1193 | 0.7379 | 845 | 1.1046 | 46373168 |
0.1984 | 0.7423 | 850 | 1.1037 | 46646440 |
0.1355 | 0.7466 | 855 | 1.1035 | 46924624 |
0.2577 | 0.7510 | 860 | 1.1052 | 47198208 |
0.1879 | 0.7554 | 865 | 1.1045 | 47471248 |
0.1758 | 0.7597 | 870 | 1.1049 | 47745600 |
0.1162 | 0.7641 | 875 | 1.1042 | 48023912 |
0.2038 | 0.7685 | 880 | 1.1043 | 48290496 |
0.2157 | 0.7728 | 885 | 1.1049 | 48565896 |
0.1394 | 0.7772 | 890 | 1.1020 | 48842992 |
0.1733 | 0.7816 | 895 | 1.1023 | 49117856 |
0.1516 | 0.7859 | 900 | 1.1038 | 49388680 |
0.0793 | 0.7903 | 905 | 1.1038 | 49654424 |
0.1641 | 0.7947 | 910 | 1.1019 | 49929584 |
0.1708 | 0.7990 | 915 | 1.1018 | 50203560 |
0.1514 | 0.8034 | 920 | 1.1021 | 50482272 |
0.1358 | 0.8078 | 925 | 1.1027 | 50749640 |
0.1963 | 0.8121 | 930 | 1.1017 | 51027608 |
0.1735 | 0.8165 | 935 | 1.1004 | 51304512 |
0.1695 | 0.8209 | 940 | 1.1023 | 51590936 |
0.1369 | 0.8252 | 945 | 1.1032 | 51871064 |
0.1472 | 0.8296 | 950 | 1.1027 | 52143496 |
0.1638 | 0.8340 | 955 | 1.1011 | 52416136 |
0.1297 | 0.8383 | 960 | 1.1016 | 52691072 |
0.1609 | 0.8427 | 965 | 1.1009 | 52967088 |
0.1903 | 0.8471 | 970 | 1.0997 | 53238592 |
0.2677 | 0.8514 | 975 | 1.1000 | 53513448 |
0.1403 | 0.8558 | 980 | 1.1008 | 53799096 |
0.1092 | 0.8602 | 985 | 1.1002 | 54076368 |
0.1304 | 0.8645 | 990 | 1.1000 | 54357080 |
0.1764 | 0.8689 | 995 | 1.1006 | 54633984 |
0.1718 | 0.8733 | 1000 | 1.1003 | 54907160 |
0.1788 | 0.8776 | 1005 | 1.0992 | 55186384 |
0.1556 | 0.8820 | 1010 | 1.1001 | 55460008 |
0.0977 | 0.8864 | 1015 | 1.1007 | 55735392 |
0.1393 | 0.8907 | 1020 | 1.1022 | 56014336 |
0.203 | 0.8951 | 1025 | 1.1001 | 56284968 |
0.1658 | 0.8995 | 1030 | 1.0979 | 56553464 |
0.1902 | 0.9038 | 1035 | 1.1010 | 56815960 |
0.1395 | 0.9082 | 1040 | 1.0996 | 57096160 |
0.1852 | 0.9126 | 1045 | 1.0975 | 57372328 |
0.1527 | 0.9169 | 1050 | 1.0991 | 57647464 |
0.1659 | 0.9213 | 1055 | 1.1006 | 57923328 |
0.1625 | 0.9257 | 1060 | 1.1000 | 58192416 |
0.0982 | 0.9300 | 1065 | 1.0980 | 58467072 |
0.1872 | 0.9344 | 1070 | 1.1018 | 58740424 |
0.1378 | 0.9388 | 1075 | 1.1034 | 59009824 |
0.1052 | 0.9431 | 1080 | 1.0974 | 59286712 |
0.1578 | 0.9475 | 1085 | 1.0958 | 59556064 |
0.1385 | 0.9519 | 1090 | 1.0988 | 59823064 |
0.1033 | 0.9562 | 1095 | 1.0999 | 60095056 |
0.1349 | 0.9606 | 1100 | 1.0988 | 60361392 |
0.1665 | 0.9650 | 1105 | 1.0986 | 60634848 |
0.1442 | 0.9693 | 1110 | 1.0979 | 60911216 |
0.1652 | 0.9737 | 1115 | 1.0982 | 61183904 |
0.1021 | 0.9781 | 1120 | 1.0972 | 61463664 |
0.1835 | 0.9824 | 1125 | 1.0988 | 61740360 |
0.1274 | 0.9868 | 1130 | 1.0979 | 62011024 |
0.1929 | 0.9912 | 1135 | 1.0951 | 62288224 |
0.1437 | 0.9955 | 1140 | 1.0958 | 62569648 |
0.1598 | 0.9999 | 1145 | 1.0961 | 62849584 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 6
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter12_sftsd0
Base model
google/gemma-2-2b