collapse_gemma-2-2b_hs2_replace_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9480
  • Num Input Tokens Seen: 5391904

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5565 0.0525 5 1.2728 291256
1.0932 0.1050 10 1.2207 576856
0.7974 0.1575 15 1.3414 859856
0.6902 0.2100 20 1.4786 1143424
0.408 0.2625 25 1.5763 1425944
0.2647 0.3150 30 1.7198 1705536
0.183 0.3675 35 1.8118 1986632
0.1028 0.4199 40 1.8767 2264728
0.0944 0.4724 45 1.9227 2547680
0.054 0.5249 50 1.9488 2835016
0.0758 0.5774 55 1.8554 3117064
0.0682 0.6299 60 1.8586 3408136
0.0638 0.6824 65 1.9812 3686176
0.0693 0.7349 70 2.0412 3968392
0.049 0.7874 75 2.0365 4248416
0.0394 0.8399 80 1.9713 4537376
0.0341 0.8924 85 1.9624 4822888
0.0517 0.9449 90 1.9243 5112896
0.0362 0.9974 95 1.9480 5391904

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter3_sftsd0

Base model

google/gemma-2-2b
Finetuned
(511)
this model