collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9485
Num Input Tokens Seen: 19153848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.3147	0.0133	5	1.1832	254068
1.0498	0.0265	10	1.0719	516552
0.9485	0.0398	15	1.0240	772872
0.702	0.0530	20	1.0216	1026756
0.5628	0.0663	25	1.0325	1281388
0.5407	0.0796	30	1.0239	1536964
0.368	0.0928	35	1.0172	1794688
0.3617	0.1061	40	1.0116	2049692
0.4112	0.1194	45	1.0076	2297888
0.3436	0.1326	50	1.0005	2555848
0.3919	0.1459	55	0.9977	2811876
0.3196	0.1591	60	0.9936	3062460
0.3427	0.1724	65	0.9888	3309344
0.3779	0.1857	70	0.9873	3558348
0.3086	0.1989	75	0.9835	3812356
0.3156	0.2122	80	0.9815	4067536
0.3243	0.2254	85	0.9801	4320128
0.353	0.2387	90	0.9767	4574252
0.315	0.2520	95	0.9759	4824900
0.2642	0.2652	100	0.9737	5082344
0.2696	0.2785	105	0.9740	5335240
0.3357	0.2918	110	0.9728	5593492
0.2995	0.3050	115	0.9700	5844120
0.2826	0.3183	120	0.9680	6100756
0.3509	0.3315	125	0.9657	6349476
0.2684	0.3448	130	0.9653	6605052
0.3148	0.3581	135	0.9654	6859396
0.2782	0.3713	140	0.9634	7111896
0.288	0.3846	145	0.9626	7365476
0.292	0.3978	150	0.9608	7623304
0.3483	0.4111	155	0.9601	7884416
0.2856	0.4244	160	0.9605	8142544
0.3717	0.4376	165	0.9597	8395160
0.258	0.4509	170	0.9583	8650004
0.3976	0.4642	175	0.9593	8900404
0.2696	0.4774	180	0.9585	9153460
0.3276	0.4907	185	0.9584	9401692
0.2893	0.5039	190	0.9571	9655896
0.3125	0.5172	195	0.9546	9917980
0.2823	0.5305	200	0.9554	10167660
0.3795	0.5437	205	0.9572	10416596
0.2921	0.5570	210	0.9555	10667880
0.2822	0.5702	215	0.9554	10922988
0.2741	0.5835	220	0.9562	11178148
0.2576	0.5968	225	0.9548	11431204
0.2774	0.6100	230	0.9549	11678968
0.3431	0.6233	235	0.9550	11938440
0.3598	0.6366	240	0.9552	12188988
0.321	0.6498	245	0.9537	12450232
0.3062	0.6631	250	0.9546	12702636
0.3252	0.6763	255	0.9538	12953464
0.2372	0.6896	260	0.9526	13213356
0.2677	0.7029	265	0.9538	13471308
0.2751	0.7161	270	0.9544	13719800
0.2369	0.7294	275	0.9540	13971304
0.2586	0.7426	280	0.9524	14218076
0.2612	0.7559	285	0.9531	14475828
0.3368	0.7692	290	0.9528	14736144
0.2279	0.7824	295	0.9513	14986776
0.2958	0.7957	300	0.9510	15241276
0.3639	0.8090	305	0.9527	15496724
0.3721	0.8222	310	0.9525	15751944
0.3751	0.8355	315	0.9516	16010436
0.3224	0.8487	320	0.9508	16264272
0.3244	0.8620	325	0.9492	16514404
0.2181	0.8753	330	0.9498	16771700
0.3845	0.8885	335	0.9521	17022500
0.2373	0.9018	340	0.9493	17281212
0.219	0.9150	345	0.9513	17531932
0.3645	0.9283	350	0.9508	17787156
0.3015	0.9416	355	0.9487	18042308
0.2971	0.9548	360	0.9504	18288852
0.2889	0.9681	365	0.9493	18546880
0.2924	0.9814	370	0.9493	18799684
0.2359	0.9946	375	0.9483	19052580

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd1

collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd1

Evaluation results