RylanSchaeffer's picture
End of training
4b45a16 verified
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd0
    results: []

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9550
  • Num Input Tokens Seen: 24347300

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.4207 0.0104 5 1.1924 257832
1.2547 0.0208 10 1.0837 513264
1.048 0.0312 15 1.0324 766088
0.851 0.0416 20 1.0155 1020968
0.7142 0.0520 25 1.0140 1278292
0.5167 0.0624 30 1.0245 1526988
0.3847 0.0728 35 1.0231 1784656
0.3415 0.0832 40 1.0195 2041944
0.3307 0.0936 45 1.0139 2297136
0.3229 0.1040 50 1.0051 2548836
0.2929 0.1144 55 1.0047 2801308
0.3572 0.1247 60 1.0015 3059268
0.2615 0.1351 65 0.9968 3313844
0.3591 0.1455 70 0.9979 3562524
0.2775 0.1559 75 0.9933 3813084
0.2585 0.1663 80 0.9926 4066920
0.2725 0.1767 85 0.9883 4324928
0.2611 0.1871 90 0.9864 4580444
0.353 0.1975 95 0.9846 4830100
0.2454 0.2079 100 0.9834 5075616
0.2581 0.2183 105 0.9827 5325788
0.2402 0.2287 110 0.9801 5572856
0.2896 0.2391 115 0.9798 5833092
0.2406 0.2495 120 0.9787 6088344
0.2592 0.2599 125 0.9775 6341032
0.2952 0.2703 130 0.9766 6602212
0.2539 0.2807 135 0.9747 6855168
0.2138 0.2911 140 0.9729 7107748
0.2758 0.3015 145 0.9749 7362420
0.2321 0.3119 150 0.9724 7615056
0.1961 0.3223 155 0.9723 7871808
0.2426 0.3327 160 0.9729 8125796
0.3097 0.3431 165 0.9724 8379432
0.1816 0.3535 170 0.9702 8630796
0.2926 0.3638 175 0.9708 8877972
0.2079 0.3742 180 0.9716 9125056
0.2498 0.3846 185 0.9712 9374900
0.2174 0.3950 190 0.9698 9627612
0.2393 0.4054 195 0.9707 9875172
0.2249 0.4158 200 0.9702 10126424
0.2005 0.4262 205 0.9681 10377444
0.2247 0.4366 210 0.9671 10631588
0.2091 0.4470 215 0.9659 10892176
0.2034 0.4574 220 0.9664 11142312
0.2397 0.4678 225 0.9670 11402760
0.2517 0.4782 230 0.9644 11657056
0.3291 0.4886 235 0.9645 11902520
0.2911 0.4990 240 0.9631 12151572
0.2172 0.5094 245 0.9614 12414324
0.2315 0.5198 250 0.9618 12672860
0.2185 0.5302 255 0.9623 12927456
0.2485 0.5406 260 0.9601 13184492
0.3055 0.5510 265 0.9610 13440432
0.2073 0.5614 270 0.9615 13691400
0.3036 0.5718 275 0.9600 13940060
0.2752 0.5822 280 0.9594 14195336
0.2654 0.5926 285 0.9597 14447088
0.3343 0.6029 290 0.9593 14706880
0.3417 0.6133 295 0.9601 14964384
0.2027 0.6237 300 0.9586 15212096
0.2576 0.6341 305 0.9574 15460956
0.2259 0.6445 310 0.9583 15719636
0.245 0.6549 315 0.9566 15980620
0.2193 0.6653 320 0.9582 16237904
0.2397 0.6757 325 0.9595 16490244
0.2264 0.6861 330 0.9567 16749444
0.2564 0.6965 335 0.9565 16996912
0.2242 0.7069 340 0.9561 17255000
0.2263 0.7173 345 0.9544 17508552
0.2417 0.7277 350 0.9554 17761116
0.2355 0.7381 355 0.9538 18019056
0.2344 0.7485 360 0.9538 18273916
0.2404 0.7589 365 0.9565 18524148
0.1552 0.7693 370 0.9577 18777776
0.2278 0.7797 375 0.9569 19028256
0.2164 0.7901 380 0.9555 19288972
0.1864 0.8005 385 0.9569 19539736
0.2767 0.8109 390 0.9572 19789388
0.2737 0.8213 395 0.9565 20034884
0.2266 0.8317 400 0.9566 20285948
0.2633 0.8421 405 0.9586 20534344
0.1812 0.8524 410 0.9545 20788364
0.2365 0.8628 415 0.9527 21043348
0.2148 0.8732 420 0.9536 21296084
0.2508 0.8836 425 0.9556 21551736
0.2298 0.8940 430 0.9553 21803092
0.2442 0.9044 435 0.9564 22059360
0.2786 0.9148 440 0.9550 22314208
0.2686 0.9252 445 0.9546 22566492
0.2733 0.9356 450 0.9567 22819036
0.2783 0.9460 455 0.9570 23073228
0.2188 0.9564 460 0.9528 23324836
0.275 0.9668 465 0.9525 23582856
0.2923 0.9772 470 0.9532 23837532
0.2072 0.9876 475 0.9537 24088876
0.2018 0.9980 480 0.9550 24347300

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1