RylanSchaeffer's picture
End of training
f48b4af verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter18_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter18_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6236
  • Num Input Tokens Seen: 4624112

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5938 0.0511 5 1.2801 239496
0.8257 0.1021 10 1.3276 475896
0.3781 0.1532 15 1.5795 715416
0.203 0.2042 20 1.7945 958344
0.0828 0.2553 25 2.1123 1201808
0.0475 0.3063 30 2.2986 1441936
0.0255 0.3574 35 2.4401 1685128
0.0238 0.4084 40 2.5437 1923208
0.0205 0.4595 45 2.6059 2159992
0.0217 0.5105 50 2.6290 2396440
0.0236 0.5616 55 2.6241 2630120
0.0209 0.6126 60 2.6176 2871120
0.0202 0.6637 65 2.6088 3102520
0.0202 0.7147 70 2.6099 3337176
0.0194 0.7658 75 2.6224 3580072
0.022 0.8168 80 2.6128 3811176
0.0201 0.8679 85 2.6123 4053312
0.022 0.9190 90 2.6136 4283408
0.0201 0.9700 95 2.6248 4529888

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1