--- license: gemma base_model: google/gemma-2-9b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2 results: [] --- # collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2 This model is a fine-tuned version of [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9546 - Num Input Tokens Seen: 24032428 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.2335 | 0 | | 1.3474 | 0.0106 | 5 | 1.1927 | 258792 | | 1.2078 | 0.0212 | 10 | 1.0768 | 508764 | | 0.9674 | 0.0317 | 15 | 1.0312 | 767600 | | 0.9091 | 0.0423 | 20 | 1.0227 | 1023848 | | 0.6911 | 0.0529 | 25 | 1.0203 | 1274344 | | 0.4705 | 0.0635 | 30 | 1.0319 | 1532132 | | 0.4228 | 0.0741 | 35 | 1.0292 | 1786368 | | 0.3975 | 0.0846 | 40 | 1.0309 | 2042540 | | 0.3206 | 0.0952 | 45 | 1.0153 | 2291720 | | 0.2994 | 0.1058 | 50 | 1.0148 | 2549012 | | 0.3297 | 0.1164 | 55 | 1.0050 | 2798620 | | 0.3171 | 0.1270 | 60 | 1.0040 | 3056108 | | 0.3139 | 0.1375 | 65 | 1.0020 | 3317448 | | 0.3386 | 0.1481 | 70 | 0.9962 | 3574540 | | 0.2501 | 0.1587 | 75 | 0.9942 | 3832272 | | 0.2482 | 0.1693 | 80 | 0.9906 | 4083760 | | 0.3098 | 0.1799 | 85 | 0.9875 | 4336164 | | 0.2415 | 0.1904 | 90 | 0.9910 | 4592732 | | 0.2895 | 0.2010 | 95 | 0.9856 | 4844652 | | 0.3474 | 0.2116 | 100 | 0.9849 | 5098604 | | 0.2472 | 0.2222 | 105 | 0.9823 | 5355028 | | 0.2587 | 0.2328 | 110 | 0.9795 | 5604792 | | 0.2691 | 0.2433 | 115 | 0.9779 | 5860556 | | 0.2396 | 0.2539 | 120 | 0.9761 | 6117824 | | 0.2505 | 0.2645 | 125 | 0.9776 | 6368016 | | 0.2609 | 0.2751 | 130 | 0.9765 | 6626036 | | 0.3553 | 0.2857 | 135 | 0.9746 | 6883312 | | 0.2906 | 0.2962 | 140 | 0.9750 | 7139620 | | 0.2989 | 0.3068 | 145 | 0.9738 | 7392496 | | 0.3201 | 0.3174 | 150 | 0.9707 | 7646420 | | 0.2327 | 0.3280 | 155 | 0.9708 | 7896552 | | 0.281 | 0.3386 | 160 | 0.9712 | 8147848 | | 0.291 | 0.3491 | 165 | 0.9701 | 8401936 | | 0.3371 | 0.3597 | 170 | 0.9699 | 8654404 | | 0.1926 | 0.3703 | 175 | 0.9703 | 8904204 | | 0.286 | 0.3809 | 180 | 0.9703 | 9158204 | | 0.2423 | 0.3915 | 185 | 0.9669 | 9411824 | | 0.245 | 0.4020 | 190 | 0.9668 | 9665800 | | 0.2902 | 0.4126 | 195 | 0.9697 | 9920256 | | 0.2895 | 0.4232 | 200 | 0.9675 | 10172112 | | 0.2431 | 0.4338 | 205 | 0.9671 | 10423820 | | 0.286 | 0.4444 | 210 | 0.9665 | 10685328 | | 0.3157 | 0.4549 | 215 | 0.9656 | 10942652 | | 0.225 | 0.4655 | 220 | 0.9658 | 11199576 | | 0.2655 | 0.4761 | 225 | 0.9654 | 11458748 | | 0.2338 | 0.4867 | 230 | 0.9646 | 11713636 | | 0.2768 | 0.4973 | 235 | 0.9647 | 11970092 | | 0.2008 | 0.5078 | 240 | 0.9649 | 12215860 | | 0.2491 | 0.5184 | 245 | 0.9634 | 12470476 | | 0.2654 | 0.5290 | 250 | 0.9622 | 12726180 | | 0.233 | 0.5396 | 255 | 0.9652 | 12977864 | | 0.2297 | 0.5502 | 260 | 0.9652 | 13228988 | | 0.2123 | 0.5607 | 265 | 0.9641 | 13487916 | | 0.3055 | 0.5713 | 270 | 0.9622 | 13742944 | | 0.252 | 0.5819 | 275 | 0.9627 | 14001064 | | 0.2156 | 0.5925 | 280 | 0.9633 | 14257372 | | 0.2373 | 0.6031 | 285 | 0.9630 | 14515772 | | 0.2533 | 0.6136 | 290 | 0.9633 | 14773828 | | 0.3101 | 0.6242 | 295 | 0.9634 | 15032732 | | 0.2549 | 0.6348 | 300 | 0.9640 | 15287912 | | 0.2208 | 0.6454 | 305 | 0.9621 | 15542376 | | 0.2164 | 0.6560 | 310 | 0.9607 | 15798084 | | 0.1831 | 0.6665 | 315 | 0.9612 | 16052724 | | 0.2364 | 0.6771 | 320 | 0.9615 | 16306560 | | 0.2993 | 0.6877 | 325 | 0.9622 | 16558416 | | 0.2002 | 0.6983 | 330 | 0.9607 | 16807372 | | 0.1973 | 0.7089 | 335 | 0.9597 | 17064900 | | 0.3301 | 0.7194 | 340 | 0.9594 | 17318932 | | 0.319 | 0.7300 | 345 | 0.9598 | 17577108 | | 0.2533 | 0.7406 | 350 | 0.9579 | 17826528 | | 0.1998 | 0.7512 | 355 | 0.9565 | 18081976 | | 0.2274 | 0.7618 | 360 | 0.9560 | 18343304 | | 0.2253 | 0.7723 | 365 | 0.9567 | 18596660 | | 0.2473 | 0.7829 | 370 | 0.9566 | 18850916 | | 0.2654 | 0.7935 | 375 | 0.9565 | 19111356 | | 0.2053 | 0.8041 | 380 | 0.9557 | 19366576 | | 0.2462 | 0.8147 | 385 | 0.9549 | 19620052 | | 0.2217 | 0.8252 | 390 | 0.9573 | 19876608 | | 0.23 | 0.8358 | 395 | 0.9586 | 20129892 | | 0.2487 | 0.8464 | 400 | 0.9563 | 20384164 | | 0.1914 | 0.8570 | 405 | 0.9562 | 20634768 | | 0.2452 | 0.8676 | 410 | 0.9581 | 20891764 | | 0.1935 | 0.8781 | 415 | 0.9572 | 21139492 | | 0.3047 | 0.8887 | 420 | 0.9544 | 21396976 | | 0.2257 | 0.8993 | 425 | 0.9555 | 21644644 | | 0.2405 | 0.9099 | 430 | 0.9558 | 21891880 | | 0.2522 | 0.9205 | 435 | 0.9547 | 22153024 | | 0.2481 | 0.9310 | 440 | 0.9527 | 22404200 | | 0.2242 | 0.9416 | 445 | 0.9527 | 22657000 | | 0.3352 | 0.9522 | 450 | 0.9527 | 22911872 | | 0.1884 | 0.9628 | 455 | 0.9540 | 23163792 | | 0.2011 | 0.9734 | 460 | 0.9537 | 23423212 | | 0.1947 | 0.9839 | 465 | 0.9525 | 23677144 | | 0.29 | 0.9945 | 470 | 0.9534 | 23929996 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1