RylanSchaeffer's picture
End of training
d5911f0 verified
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_replace_iter6_sftsd2
    results: []

collapse_gemma-2-9b_hs2_replace_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5665
  • Num Input Tokens Seen: 4530600

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.3097 0.0511 5 1.0698 229532
0.4786 0.1021 10 1.2058 462632
0.1486 0.1532 15 1.3852 694572
0.0712 0.2043 20 1.4796 924920
0.0308 0.2553 25 1.6805 1159600
0.0246 0.3064 30 1.6485 1394384
0.0249 0.3575 35 1.7296 1625220
0.0245 0.4086 40 1.7449 1861604
0.0279 0.4596 45 1.6980 2092352
0.0308 0.5107 50 1.5217 2328876
0.0227 0.5618 55 1.4251 2558408
0.0198 0.6128 60 1.4416 2786776
0.0244 0.6639 65 1.4659 3022844
0.0223 0.7150 70 1.4740 3252704
0.0217 0.7660 75 1.4962 3486108
0.0223 0.8171 80 1.5131 3722364
0.0211 0.8682 85 1.5303 3959596
0.0202 0.9192 90 1.5489 4199296
0.0213 0.9703 95 1.5624 4434360

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1