metadata
license: gemma
base_model: google/gemma-2-9b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2
results: []
collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9546
- Num Input Tokens Seen: 24032428
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.3474 | 0.0106 | 5 | 1.1927 | 258792 |
1.2078 | 0.0212 | 10 | 1.0768 | 508764 |
0.9674 | 0.0317 | 15 | 1.0312 | 767600 |
0.9091 | 0.0423 | 20 | 1.0227 | 1023848 |
0.6911 | 0.0529 | 25 | 1.0203 | 1274344 |
0.4705 | 0.0635 | 30 | 1.0319 | 1532132 |
0.4228 | 0.0741 | 35 | 1.0292 | 1786368 |
0.3975 | 0.0846 | 40 | 1.0309 | 2042540 |
0.3206 | 0.0952 | 45 | 1.0153 | 2291720 |
0.2994 | 0.1058 | 50 | 1.0148 | 2549012 |
0.3297 | 0.1164 | 55 | 1.0050 | 2798620 |
0.3171 | 0.1270 | 60 | 1.0040 | 3056108 |
0.3139 | 0.1375 | 65 | 1.0020 | 3317448 |
0.3386 | 0.1481 | 70 | 0.9962 | 3574540 |
0.2501 | 0.1587 | 75 | 0.9942 | 3832272 |
0.2482 | 0.1693 | 80 | 0.9906 | 4083760 |
0.3098 | 0.1799 | 85 | 0.9875 | 4336164 |
0.2415 | 0.1904 | 90 | 0.9910 | 4592732 |
0.2895 | 0.2010 | 95 | 0.9856 | 4844652 |
0.3474 | 0.2116 | 100 | 0.9849 | 5098604 |
0.2472 | 0.2222 | 105 | 0.9823 | 5355028 |
0.2587 | 0.2328 | 110 | 0.9795 | 5604792 |
0.2691 | 0.2433 | 115 | 0.9779 | 5860556 |
0.2396 | 0.2539 | 120 | 0.9761 | 6117824 |
0.2505 | 0.2645 | 125 | 0.9776 | 6368016 |
0.2609 | 0.2751 | 130 | 0.9765 | 6626036 |
0.3553 | 0.2857 | 135 | 0.9746 | 6883312 |
0.2906 | 0.2962 | 140 | 0.9750 | 7139620 |
0.2989 | 0.3068 | 145 | 0.9738 | 7392496 |
0.3201 | 0.3174 | 150 | 0.9707 | 7646420 |
0.2327 | 0.3280 | 155 | 0.9708 | 7896552 |
0.281 | 0.3386 | 160 | 0.9712 | 8147848 |
0.291 | 0.3491 | 165 | 0.9701 | 8401936 |
0.3371 | 0.3597 | 170 | 0.9699 | 8654404 |
0.1926 | 0.3703 | 175 | 0.9703 | 8904204 |
0.286 | 0.3809 | 180 | 0.9703 | 9158204 |
0.2423 | 0.3915 | 185 | 0.9669 | 9411824 |
0.245 | 0.4020 | 190 | 0.9668 | 9665800 |
0.2902 | 0.4126 | 195 | 0.9697 | 9920256 |
0.2895 | 0.4232 | 200 | 0.9675 | 10172112 |
0.2431 | 0.4338 | 205 | 0.9671 | 10423820 |
0.286 | 0.4444 | 210 | 0.9665 | 10685328 |
0.3157 | 0.4549 | 215 | 0.9656 | 10942652 |
0.225 | 0.4655 | 220 | 0.9658 | 11199576 |
0.2655 | 0.4761 | 225 | 0.9654 | 11458748 |
0.2338 | 0.4867 | 230 | 0.9646 | 11713636 |
0.2768 | 0.4973 | 235 | 0.9647 | 11970092 |
0.2008 | 0.5078 | 240 | 0.9649 | 12215860 |
0.2491 | 0.5184 | 245 | 0.9634 | 12470476 |
0.2654 | 0.5290 | 250 | 0.9622 | 12726180 |
0.233 | 0.5396 | 255 | 0.9652 | 12977864 |
0.2297 | 0.5502 | 260 | 0.9652 | 13228988 |
0.2123 | 0.5607 | 265 | 0.9641 | 13487916 |
0.3055 | 0.5713 | 270 | 0.9622 | 13742944 |
0.252 | 0.5819 | 275 | 0.9627 | 14001064 |
0.2156 | 0.5925 | 280 | 0.9633 | 14257372 |
0.2373 | 0.6031 | 285 | 0.9630 | 14515772 |
0.2533 | 0.6136 | 290 | 0.9633 | 14773828 |
0.3101 | 0.6242 | 295 | 0.9634 | 15032732 |
0.2549 | 0.6348 | 300 | 0.9640 | 15287912 |
0.2208 | 0.6454 | 305 | 0.9621 | 15542376 |
0.2164 | 0.6560 | 310 | 0.9607 | 15798084 |
0.1831 | 0.6665 | 315 | 0.9612 | 16052724 |
0.2364 | 0.6771 | 320 | 0.9615 | 16306560 |
0.2993 | 0.6877 | 325 | 0.9622 | 16558416 |
0.2002 | 0.6983 | 330 | 0.9607 | 16807372 |
0.1973 | 0.7089 | 335 | 0.9597 | 17064900 |
0.3301 | 0.7194 | 340 | 0.9594 | 17318932 |
0.319 | 0.7300 | 345 | 0.9598 | 17577108 |
0.2533 | 0.7406 | 350 | 0.9579 | 17826528 |
0.1998 | 0.7512 | 355 | 0.9565 | 18081976 |
0.2274 | 0.7618 | 360 | 0.9560 | 18343304 |
0.2253 | 0.7723 | 365 | 0.9567 | 18596660 |
0.2473 | 0.7829 | 370 | 0.9566 | 18850916 |
0.2654 | 0.7935 | 375 | 0.9565 | 19111356 |
0.2053 | 0.8041 | 380 | 0.9557 | 19366576 |
0.2462 | 0.8147 | 385 | 0.9549 | 19620052 |
0.2217 | 0.8252 | 390 | 0.9573 | 19876608 |
0.23 | 0.8358 | 395 | 0.9586 | 20129892 |
0.2487 | 0.8464 | 400 | 0.9563 | 20384164 |
0.1914 | 0.8570 | 405 | 0.9562 | 20634768 |
0.2452 | 0.8676 | 410 | 0.9581 | 20891764 |
0.1935 | 0.8781 | 415 | 0.9572 | 21139492 |
0.3047 | 0.8887 | 420 | 0.9544 | 21396976 |
0.2257 | 0.8993 | 425 | 0.9555 | 21644644 |
0.2405 | 0.9099 | 430 | 0.9558 | 21891880 |
0.2522 | 0.9205 | 435 | 0.9547 | 22153024 |
0.2481 | 0.9310 | 440 | 0.9527 | 22404200 |
0.2242 | 0.9416 | 445 | 0.9527 | 22657000 |
0.3352 | 0.9522 | 450 | 0.9527 | 22911872 |
0.1884 | 0.9628 | 455 | 0.9540 | 23163792 |
0.2011 | 0.9734 | 460 | 0.9537 | 23423212 |
0.1947 | 0.9839 | 465 | 0.9525 | 23677144 |
0.29 | 0.9945 | 470 | 0.9534 | 23929996 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1