metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter19_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter19_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.6879
Num Input Tokens Seen: 4457712

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.4638	0.0511	5	1.2821	233808
0.7283	0.1022	10	1.3806	468336
0.3747	0.1533	15	1.6459	702960
0.2055	0.2043	20	1.8690	932896
0.1423	0.2554	25	2.1258	1154072
0.0546	0.3065	30	2.2993	1386840
0.037	0.3576	35	2.4683	1615120
0.0267	0.4087	40	2.5698	1842904
0.0257	0.4598	45	2.6469	2075912
0.0253	0.5109	50	2.6736	2302096
0.0224	0.5619	55	2.6900	2534784
0.0239	0.6130	60	2.6892	2764288
0.023	0.6641	65	2.6803	2994536
0.023	0.7152	70	2.6896	3222064
0.0235	0.7663	75	2.6908	3452832
0.0221	0.8174	80	2.6687	3686848
0.022	0.8685	85	2.6683	3918480
0.0229	0.9195	90	2.6810	4142336
0.0222	0.9706	95	2.6850	4367928

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1