metadata

license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_replace_iter4_sftsd2
    results: []

collapse_gemma-2-9b_hs2_replace_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.4790
Num Input Tokens Seen: 4639036

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.2061	0.0511	5	1.0694	240324
0.5247	0.1022	10	1.1930	482000
0.2089	0.1533	15	1.3107	712380
0.0939	0.2044	20	1.4465	957020
0.0504	0.2555	25	1.5378	1188912
0.0456	0.3066	30	1.4778	1428132
0.0331	0.3577	35	1.4145	1677336
0.0238	0.4088	40	1.4888	1914204
0.0255	0.4599	45	1.5425	2146180
0.0243	0.5110	50	1.5185	2379516
0.0381	0.5621	55	1.4742	2619096
0.0305	0.6132	60	1.4191	2862804
0.0227	0.6643	65	1.4256	3103004
0.021	0.7154	70	1.4350	3346964
0.0279	0.7665	75	1.4590	3587168
0.0242	0.8176	80	1.5009	3830384
0.0262	0.8687	85	1.4784	4068408
0.0244	0.9198	90	1.4782	4308452
0.0228	0.9709	95	1.4777	4542732

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1