collapse_gemma-2-2b_hs2_replace_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2067
  • Num Input Tokens Seen: 5160120

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4604 0.0511 5 1.2762 253360
1.0513 0.1021 10 1.2509 521584
0.7603 0.1532 15 1.3994 785240
0.4428 0.2042 20 1.6464 1046120
0.2467 0.2553 25 1.7842 1316104
0.1764 0.3063 30 1.9304 1582624
0.0943 0.3574 35 2.0234 1851432
0.0415 0.4084 40 2.1085 2122560
0.0341 0.4595 45 2.1800 2395064
0.0295 0.5105 50 2.1900 2661576
0.0274 0.5616 55 2.1805 2933440
0.0251 0.6126 60 2.2082 3208152
0.029 0.6637 65 2.1911 3473984
0.0258 0.7147 70 2.1418 3743624
0.0259 0.7658 75 2.1601 4002760
0.023 0.8168 80 2.2033 4261440
0.0279 0.8679 85 2.2172 4530184
0.0256 0.9190 90 2.2190 4797024
0.0274 0.9700 95 2.2002 5054352

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter5_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model