RylanSchaeffer's picture
End of training
186b837 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter3_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9480
  • Num Input Tokens Seen: 5391904

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5565 0.0525 5 1.2728 291256
1.0932 0.1050 10 1.2207 576856
0.7974 0.1575 15 1.3414 859856
0.6902 0.2100 20 1.4786 1143424
0.408 0.2625 25 1.5763 1425944
0.2647 0.3150 30 1.7198 1705536
0.183 0.3675 35 1.8118 1986632
0.1028 0.4199 40 1.8767 2264728
0.0944 0.4724 45 1.9227 2547680
0.054 0.5249 50 1.9488 2835016
0.0758 0.5774 55 1.8554 3117064
0.0682 0.6299 60 1.8586 3408136
0.0638 0.6824 65 1.9812 3686176
0.0693 0.7349 70 2.0412 3968392
0.049 0.7874 75 2.0365 4248416
0.0394 0.8399 80 1.9713 4537376
0.0341 0.8924 85 1.9624 4822888
0.0517 0.9449 90 1.9243 5112896
0.0362 0.9974 95 1.9480 5391904

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1