metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter6_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.4776
Num Input Tokens Seen: 4931704

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5189	0.0513	5	1.2749	258200
0.9714	0.1026	10	1.2495	517512
0.6202	0.1539	15	1.4088	775024
0.3538	0.2053	20	1.6032	1026560
0.2158	0.2566	25	1.8219	1270944
0.1167	0.3079	30	2.0376	1527480
0.0654	0.3592	35	2.2660	1777448
0.0393	0.4105	40	2.3894	2029984
0.031	0.4618	45	2.4278	2278552
0.0292	0.5131	50	2.4650	2534640
0.0258	0.5645	55	2.4896	2783408
0.0255	0.6158	60	2.4676	3035384
0.0235	0.6671	65	2.4426	3294576
0.0249	0.7184	70	2.4442	3548680
0.0231	0.7697	75	2.4505	3807912
0.0249	0.8210	80	2.4582	4065352
0.0225	0.8724	85	2.4512	4318600
0.0216	0.9237	90	2.4613	4577512
0.021	0.9750	95	2.4749	4833752

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1