RylanSchaeffer's picture
End of training
f203433 verified
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_replace_iter4_sftsd2
    results: []

collapse_gemma-2-9b_hs2_replace_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4790
  • Num Input Tokens Seen: 4639036

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.2061 0.0511 5 1.0694 240324
0.5247 0.1022 10 1.1930 482000
0.2089 0.1533 15 1.3107 712380
0.0939 0.2044 20 1.4465 957020
0.0504 0.2555 25 1.5378 1188912
0.0456 0.3066 30 1.4778 1428132
0.0331 0.3577 35 1.4145 1677336
0.0238 0.4088 40 1.4888 1914204
0.0255 0.4599 45 1.5425 2146180
0.0243 0.5110 50 1.5185 2379516
0.0381 0.5621 55 1.4742 2619096
0.0305 0.6132 60 1.4191 2862804
0.0227 0.6643 65 1.4256 3103004
0.021 0.7154 70 1.4350 3346964
0.0279 0.7665 75 1.4590 3587168
0.0242 0.8176 80 1.5009 3830384
0.0262 0.8687 85 1.4784 4068408
0.0244 0.9198 90 1.4782 4308452
0.0228 0.9709 95 1.4777 4542732

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1