metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter3_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.9480
Num Input Tokens Seen: 5391904

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5565	0.0525	5	1.2728	291256
1.0932	0.1050	10	1.2207	576856
0.7974	0.1575	15	1.3414	859856
0.6902	0.2100	20	1.4786	1143424
0.408	0.2625	25	1.5763	1425944
0.2647	0.3150	30	1.7198	1705536
0.183	0.3675	35	1.8118	1986632
0.1028	0.4199	40	1.8767	2264728
0.0944	0.4724	45	1.9227	2547680
0.054	0.5249	50	1.9488	2835016
0.0758	0.5774	55	1.8554	3117064
0.0682	0.6299	60	1.8586	3408136
0.0638	0.6824	65	1.9812	3686176
0.0693	0.7349	70	2.0412	3968392
0.049	0.7874	75	2.0365	4248416
0.0394	0.8399	80	1.9713	4537376
0.0341	0.8924	85	1.9624	4822888
0.0517	0.9449	90	1.9243	5112896
0.0362	0.9974	95	1.9480	5391904

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1