RylanSchaeffer's picture
End of training
8f0e9c9 verified
metadata
license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_replace_iter3_sftsd0
    results: []

collapse_gemma-2-27b_hs2_replace_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3653
  • Num Input Tokens Seen: 3955416

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.8489 0.0583 5 1.0535 228936
3.3414 0.1165 10 1.1298 463812
2.8437 0.1748 15 1.1488 702592
1.9341 0.2331 20 1.2179 938224
1.1621 0.2913 25 1.2570 1165920
0.6806 0.3496 30 1.2791 1403276
0.6728 0.4079 35 1.2535 1650592
0.5266 0.4661 40 1.2409 1880524
0.5377 0.5244 45 1.2414 2104356
0.4042 0.5827 50 1.2466 2335700
0.7168 0.6409 55 1.2873 2564852
0.3333 0.6992 60 1.3003 2791324
0.5753 0.7575 65 1.3164 3032688
0.3997 0.8157 70 1.3235 3267132
0.3566 0.8740 75 1.3464 3502604
0.4565 0.9323 80 1.3853 3727432
0.1841 0.9905 85 1.3653 3955416

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1