collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0882
  • Num Input Tokens Seen: 10720872

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.593 0.0266 5 1.3326 282968
1.468 0.0531 10 1.2105 570240
1.2681 0.0797 15 1.1563 856832
1.2072 0.1063 20 1.1314 1140048
1.1683 0.1328 25 1.1132 1423424
1.0607 0.1594 30 1.1129 1709744
1.1077 0.1860 35 1.1118 2001304
1.0482 0.2126 40 1.1135 2294992
1.074 0.2391 45 1.1164 2584408
0.9352 0.2657 50 1.1174 2875416
0.8739 0.2923 55 1.1168 3154376
0.8673 0.3188 60 1.1216 3443824
0.8946 0.3454 65 1.1154 3729728
0.7916 0.3720 70 1.1306 4012624
0.9486 0.3985 75 1.1155 4302744
0.721 0.4251 80 1.1205 4583480
0.8319 0.4517 85 1.1200 4859656
0.6664 0.4782 90 1.1144 5141344
0.7822 0.5048 95 1.1131 5420960
0.736 0.5314 100 1.1124 5708552
0.8007 0.5580 105 1.1126 5987736
0.6431 0.5845 110 1.1073 6271504
0.6754 0.6111 115 1.1048 6559824
0.8061 0.6377 120 1.1066 6848192
0.7043 0.6642 125 1.1044 7131224
0.6619 0.6908 130 1.1028 7410960
0.6988 0.7174 135 1.0991 7699432
0.7132 0.7439 140 1.0989 7986208
0.6748 0.7705 145 1.0963 8274264
0.7033 0.7971 150 1.0959 8560800
0.7145 0.8236 155 1.0943 8847792
0.6951 0.8502 160 1.0928 9134464
0.6958 0.8768 165 1.0908 9420928
0.6325 0.9034 170 1.0915 9697944
0.6244 0.9299 175 1.0886 9987896
0.6517 0.9565 180 1.0902 10269648
0.7256 0.9831 185 1.0874 10553360

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2

Base model

google/gemma-2-2b
Finetuned
(484)
this model