metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter18_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter18_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.6236
Num Input Tokens Seen: 4624112

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5938	0.0511	5	1.2801	239496
0.8257	0.1021	10	1.3276	475896
0.3781	0.1532	15	1.5795	715416
0.203	0.2042	20	1.7945	958344
0.0828	0.2553	25	2.1123	1201808
0.0475	0.3063	30	2.2986	1441936
0.0255	0.3574	35	2.4401	1685128
0.0238	0.4084	40	2.5437	1923208
0.0205	0.4595	45	2.6059	2159992
0.0217	0.5105	50	2.6290	2396440
0.0236	0.5616	55	2.6241	2630120
0.0209	0.6126	60	2.6176	2871120
0.0202	0.6637	65	2.6088	3102520
0.0202	0.7147	70	2.6099	3337176
0.0194	0.7658	75	2.6224	3580072
0.022	0.8168	80	2.6128	3811176
0.0201	0.8679	85	2.6123	4053312
0.022	0.9190	90	2.6136	4283408
0.0201	0.9700	95	2.6248	4529888

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1