RylanSchaeffer's picture
End of training
cfcf2df verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd1
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1502
  • Num Input Tokens Seen: 5139552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4366 0.0537 5 1.2652 272808
1.205 0.1075 10 1.1792 543728
1.0601 0.1612 15 1.1613 817168
0.9601 0.2149 20 1.1532 1092816
0.9378 0.2686 25 1.1523 1371272
0.9852 0.3224 30 1.1648 1652968
0.9609 0.3761 35 1.1735 1931576
0.8948 0.4298 40 1.1661 2213968
0.8069 0.4835 45 1.1685 2496776
0.6446 0.5373 50 1.1695 2771880
0.7284 0.5910 55 1.1612 3049008
0.6245 0.6447 60 1.1637 3321840
0.5641 0.6985 65 1.1559 3594864
0.5613 0.7522 70 1.1590 3871512
0.6246 0.8059 75 1.1572 4140888
0.6635 0.8596 80 1.1523 4417664
0.626 0.9134 85 1.1528 4694904
0.579 0.9671 90 1.1477 4973416

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1