collapse_gemma-2-2b_hs2_replace_iter17_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6191
  • Num Input Tokens Seen: 4642408

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.587 0.0511 5 1.2806 240384
0.8216 0.1021 10 1.3186 475152
0.3812 0.1532 15 1.5675 715944
0.2267 0.2042 20 1.7862 958824
0.0935 0.2553 25 2.0985 1201088
0.056 0.3063 30 2.2426 1443344
0.0279 0.3574 35 2.4156 1686944
0.0236 0.4084 40 2.5409 1929368
0.0208 0.4595 45 2.5985 2169056
0.0191 0.5105 50 2.6249 2405328
0.0212 0.5616 55 2.6265 2641384
0.0201 0.6126 60 2.6278 2883048
0.0198 0.6637 65 2.6242 3119208
0.0194 0.7147 70 2.6267 3352216
0.0191 0.7658 75 2.6306 3595840
0.0227 0.8168 80 2.6203 3829328
0.0194 0.8679 85 2.6171 4070984
0.0215 0.9190 90 2.6168 4301112
0.0192 0.9700 95 2.6253 4547840

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter17_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model