collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9289
Num Input Tokens Seen: 13382208

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
2.5273	0.0186	5	1.0497	239432
2.2642	0.0371	10	0.9938	490472
2.1944	0.0557	15	0.9799	738476
2.0449	0.0742	20	0.9761	991768
1.7622	0.0928	25	0.9788	1234816
1.6823	0.1113	30	0.9860	1486428
1.5237	0.1299	35	0.9862	1735404
1.4638	0.1484	40	0.9833	1983880
1.2775	0.1670	45	0.9803	2226820
1.246	0.1855	50	0.9762	2471660
1.1798	0.2041	55	0.9701	2723564
1.1618	0.2226	60	0.9658	2969216
1.1255	0.2412	65	0.9656	3218648
0.902	0.2597	70	0.9609	3474940
0.873	0.2783	75	0.9577	3721068
0.7585	0.2968	80	0.9560	3977036
0.9329	0.3154	85	0.9542	4227848
0.9888	0.3340	90	0.9544	4471040
0.8856	0.3525	95	0.9510	4719044
0.8959	0.3711	100	0.9519	4966088
0.707	0.3896	105	0.9476	5210868
0.8089	0.4082	110	0.9476	5470016
0.7476	0.4267	115	0.9459	5718420
0.6473	0.4453	120	0.9438	5972536
0.758	0.4638	125	0.9435	6221248
0.8454	0.4824	130	0.9403	6475340
0.7976	0.5009	135	0.9412	6727528
0.8476	0.5195	140	0.9400	6982388
0.7554	0.5380	145	0.9387	7218200
0.7193	0.5566	150	0.9386	7466484
0.6614	0.5751	155	0.9378	7709588
0.7586	0.5937	160	0.9344	7958964
0.769	0.6122	165	0.9353	8214680
0.6696	0.6308	170	0.9347	8457832
0.8566	0.6494	175	0.9377	8710088
0.8531	0.6679	180	0.9346	8959260
0.8454	0.6865	185	0.9346	9216248
0.7314	0.7050	190	0.9330	9465964
0.914	0.7236	195	0.9326	9718276
0.6292	0.7421	200	0.9335	9963556
0.683	0.7607	205	0.9348	10204596
0.5968	0.7792	210	0.9338	10460212
0.7731	0.7978	215	0.9338	10712008
0.707	0.8163	220	0.9318	10955092
0.7059	0.8349	225	0.9348	11197300
0.6878	0.8534	230	0.9301	11440440
0.6978	0.8720	235	0.9312	11685992
0.8379	0.8905	240	0.9294	11928976
0.8208	0.9091	245	0.9331	12185160
0.7653	0.9276	250	0.9314	12430192
0.7021	0.9462	255	0.9295	12684252
0.78	0.9647	260	0.9327	12932032
0.6731	0.9833	265	0.9279	13180768

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd0

collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd0

Evaluation results