collapse_gemma-2-2b_hs2_accumulatesubsample_iter8_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1905
  • Num Input Tokens Seen: 5048840

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.403 0.0533 5 1.2724 279256
1.1785 0.1066 10 1.1963 545440
1.0295 0.1599 15 1.1799 811104
0.967 0.2132 20 1.1842 1083896
0.7945 0.2665 25 1.1937 1361912
0.786 0.3198 30 1.2116 1632696
0.8056 0.3731 35 1.2189 1905960
0.7204 0.4264 40 1.2143 2171384
0.7197 0.4797 45 1.2007 2448800
0.6811 0.5330 50 1.2093 2721560
0.692 0.5863 55 1.2032 2989280
0.5352 0.6396 60 1.2017 3264080
0.5358 0.6929 65 1.1925 3537184
0.4779 0.7462 70 1.2035 3812032
0.4526 0.7995 75 1.1994 4079120
0.5517 0.8528 80 1.1940 4348552
0.5031 0.9061 85 1.1921 4616056
0.507 0.9594 90 1.1952 4883528

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter8_sftsd2

Base model

google/gemma-2-2b
Finetuned
(487)
this model