metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter3_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.9367
Num Input Tokens Seen: 5020224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5266	0.0526	5	1.2670	264096
1.1256	0.1053	10	1.2054	527848
0.7847	0.1579	15	1.3177	792400
0.6784	0.2105	20	1.4672	1052232
0.334	0.2632	25	1.6317	1312448
0.2438	0.3158	30	1.7383	1578664
0.1837	0.3684	35	1.8782	1839144
0.1245	0.4211	40	1.9244	2108176
0.0826	0.4737	45	2.0330	2367432
0.0758	0.5263	50	1.9421	2626312
0.0651	0.5789	55	1.9341	2894088
0.0576	0.6316	60	2.0003	3163960
0.1138	0.6842	65	1.9950	3426520
0.0404	0.7368	70	1.9335	3690192
0.0431	0.7895	75	1.8592	3956584
0.0351	0.8421	80	1.9416	4223128
0.0702	0.8947	85	1.9729	4490032
0.0323	0.9474	90	1.8898	4755864
0.0432	1.0	95	1.9367	5020224

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1