RylanSchaeffer's picture
End of training
84ae732 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter6_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4776
  • Num Input Tokens Seen: 4931704

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5189 0.0513 5 1.2749 258200
0.9714 0.1026 10 1.2495 517512
0.6202 0.1539 15 1.4088 775024
0.3538 0.2053 20 1.6032 1026560
0.2158 0.2566 25 1.8219 1270944
0.1167 0.3079 30 2.0376 1527480
0.0654 0.3592 35 2.2660 1777448
0.0393 0.4105 40 2.3894 2029984
0.031 0.4618 45 2.4278 2278552
0.0292 0.5131 50 2.4650 2534640
0.0258 0.5645 55 2.4896 2783408
0.0255 0.6158 60 2.4676 3035384
0.0235 0.6671 65 2.4426 3294576
0.0249 0.7184 70 2.4442 3548680
0.0231 0.7697 75 2.4505 3807912
0.0249 0.8210 80 2.4582 4065352
0.0225 0.8724 85 2.4512 4318600
0.0216 0.9237 90 2.4613 4577512
0.021 0.9750 95 2.4749 4833752

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1