collapse_gemma-2-9b_hs2_replace_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3331
  • Num Input Tokens Seen: 4767740

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.1673 0.0519 5 1.0681 247836
0.6647 0.1037 10 1.1160 500564
0.3357 0.1556 15 1.2251 752148
0.1216 0.2075 20 1.2090 998756
0.0601 0.2593 25 1.2567 1249040
0.0575 0.3112 30 1.2783 1498900
0.0399 0.3630 35 1.2369 1750044
0.0626 0.4149 40 1.2392 2000384
0.0329 0.4668 45 1.1987 2257268
0.0283 0.5186 50 1.2369 2506816
0.0306 0.5705 55 1.2321 2749044
0.0274 0.6224 60 1.2139 3001308
0.039 0.6742 65 1.2486 3246720
0.0376 0.7261 70 1.2664 3490852
0.0312 0.7780 75 1.2641 3736420
0.04 0.8298 80 1.2473 3980484
0.0344 0.8817 85 1.3026 4227696
0.0329 0.9335 90 1.3221 4471696
0.0317 0.9854 95 1.3285 4718104

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_replace_iter3_sftsd2

Base model

google/gemma-2-9b
Finetuned
(226)
this model