collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1629
  • Num Input Tokens Seen: 5250360

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4507 0.0527 5 1.2649 285024
1.2669 0.1055 10 1.1794 566176
1.2492 0.1582 15 1.1551 833640
1.2366 0.2109 20 1.1399 1110504
1.0557 0.2637 25 1.1411 1388928
0.9926 0.3164 30 1.1517 1667696
0.9331 0.3691 35 1.1582 1947440
0.8809 0.4219 40 1.1619 2225640
0.8058 0.4746 45 1.1649 2505560
0.8295 0.5274 50 1.1664 2785208
0.8397 0.5801 55 1.1773 3063096
0.7422 0.6328 60 1.1690 3345784
0.7239 0.6856 65 1.1694 3626168
0.6693 0.7383 70 1.1631 3907224
0.6977 0.7910 75 1.1692 4190504
0.6348 0.8438 80 1.1727 4469088
0.6957 0.8965 85 1.1625 4749008
0.5706 0.9492 90 1.1691 5028272

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd2

Base model

google/gemma-2-2b
Finetuned
(512)
this model