collapse_gemma-2-9b_hs2_replace_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5665
  • Num Input Tokens Seen: 4530600

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.3097 0.0511 5 1.0698 229532
0.4786 0.1021 10 1.2058 462632
0.1486 0.1532 15 1.3852 694572
0.0712 0.2043 20 1.4796 924920
0.0308 0.2553 25 1.6805 1159600
0.0246 0.3064 30 1.6485 1394384
0.0249 0.3575 35 1.7296 1625220
0.0245 0.4086 40 1.7449 1861604
0.0279 0.4596 45 1.6980 2092352
0.0308 0.5107 50 1.5217 2328876
0.0227 0.5618 55 1.4251 2558408
0.0198 0.6128 60 1.4416 2786776
0.0244 0.6639 65 1.4659 3022844
0.0223 0.7150 70 1.4740 3252704
0.0217 0.7660 75 1.4962 3486108
0.0223 0.8171 80 1.5131 3722364
0.0211 0.8682 85 1.5303 3959596
0.0202 0.9192 90 1.5489 4199296
0.0213 0.9703 95 1.5624 4434360

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_replace_iter6_sftsd2

Base model

google/gemma-2-9b
Finetuned
(222)
this model