RylanSchaeffer's picture
End of training
624a267 verified
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2
    results: []

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9546
  • Num Input Tokens Seen: 24032428

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.3474 0.0106 5 1.1927 258792
1.2078 0.0212 10 1.0768 508764
0.9674 0.0317 15 1.0312 767600
0.9091 0.0423 20 1.0227 1023848
0.6911 0.0529 25 1.0203 1274344
0.4705 0.0635 30 1.0319 1532132
0.4228 0.0741 35 1.0292 1786368
0.3975 0.0846 40 1.0309 2042540
0.3206 0.0952 45 1.0153 2291720
0.2994 0.1058 50 1.0148 2549012
0.3297 0.1164 55 1.0050 2798620
0.3171 0.1270 60 1.0040 3056108
0.3139 0.1375 65 1.0020 3317448
0.3386 0.1481 70 0.9962 3574540
0.2501 0.1587 75 0.9942 3832272
0.2482 0.1693 80 0.9906 4083760
0.3098 0.1799 85 0.9875 4336164
0.2415 0.1904 90 0.9910 4592732
0.2895 0.2010 95 0.9856 4844652
0.3474 0.2116 100 0.9849 5098604
0.2472 0.2222 105 0.9823 5355028
0.2587 0.2328 110 0.9795 5604792
0.2691 0.2433 115 0.9779 5860556
0.2396 0.2539 120 0.9761 6117824
0.2505 0.2645 125 0.9776 6368016
0.2609 0.2751 130 0.9765 6626036
0.3553 0.2857 135 0.9746 6883312
0.2906 0.2962 140 0.9750 7139620
0.2989 0.3068 145 0.9738 7392496
0.3201 0.3174 150 0.9707 7646420
0.2327 0.3280 155 0.9708 7896552
0.281 0.3386 160 0.9712 8147848
0.291 0.3491 165 0.9701 8401936
0.3371 0.3597 170 0.9699 8654404
0.1926 0.3703 175 0.9703 8904204
0.286 0.3809 180 0.9703 9158204
0.2423 0.3915 185 0.9669 9411824
0.245 0.4020 190 0.9668 9665800
0.2902 0.4126 195 0.9697 9920256
0.2895 0.4232 200 0.9675 10172112
0.2431 0.4338 205 0.9671 10423820
0.286 0.4444 210 0.9665 10685328
0.3157 0.4549 215 0.9656 10942652
0.225 0.4655 220 0.9658 11199576
0.2655 0.4761 225 0.9654 11458748
0.2338 0.4867 230 0.9646 11713636
0.2768 0.4973 235 0.9647 11970092
0.2008 0.5078 240 0.9649 12215860
0.2491 0.5184 245 0.9634 12470476
0.2654 0.5290 250 0.9622 12726180
0.233 0.5396 255 0.9652 12977864
0.2297 0.5502 260 0.9652 13228988
0.2123 0.5607 265 0.9641 13487916
0.3055 0.5713 270 0.9622 13742944
0.252 0.5819 275 0.9627 14001064
0.2156 0.5925 280 0.9633 14257372
0.2373 0.6031 285 0.9630 14515772
0.2533 0.6136 290 0.9633 14773828
0.3101 0.6242 295 0.9634 15032732
0.2549 0.6348 300 0.9640 15287912
0.2208 0.6454 305 0.9621 15542376
0.2164 0.6560 310 0.9607 15798084
0.1831 0.6665 315 0.9612 16052724
0.2364 0.6771 320 0.9615 16306560
0.2993 0.6877 325 0.9622 16558416
0.2002 0.6983 330 0.9607 16807372
0.1973 0.7089 335 0.9597 17064900
0.3301 0.7194 340 0.9594 17318932
0.319 0.7300 345 0.9598 17577108
0.2533 0.7406 350 0.9579 17826528
0.1998 0.7512 355 0.9565 18081976
0.2274 0.7618 360 0.9560 18343304
0.2253 0.7723 365 0.9567 18596660
0.2473 0.7829 370 0.9566 18850916
0.2654 0.7935 375 0.9565 19111356
0.2053 0.8041 380 0.9557 19366576
0.2462 0.8147 385 0.9549 19620052
0.2217 0.8252 390 0.9573 19876608
0.23 0.8358 395 0.9586 20129892
0.2487 0.8464 400 0.9563 20384164
0.1914 0.8570 405 0.9562 20634768
0.2452 0.8676 410 0.9581 20891764
0.1935 0.8781 415 0.9572 21139492
0.3047 0.8887 420 0.9544 21396976
0.2257 0.8993 425 0.9555 21644644
0.2405 0.9099 430 0.9558 21891880
0.2522 0.9205 435 0.9547 22153024
0.2481 0.9310 440 0.9527 22404200
0.2242 0.9416 445 0.9527 22657000
0.3352 0.9522 450 0.9527 22911872
0.1884 0.9628 455 0.9540 23163792
0.2011 0.9734 460 0.9537 23423212
0.1947 0.9839 465 0.9525 23677144
0.29 0.9945 470 0.9534 23929996

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1