collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9546
Num Input Tokens Seen: 24032428

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.3474	0.0106	5	1.1927	258792
1.2078	0.0212	10	1.0768	508764
0.9674	0.0317	15	1.0312	767600
0.9091	0.0423	20	1.0227	1023848
0.6911	0.0529	25	1.0203	1274344
0.4705	0.0635	30	1.0319	1532132
0.4228	0.0741	35	1.0292	1786368
0.3975	0.0846	40	1.0309	2042540
0.3206	0.0952	45	1.0153	2291720
0.2994	0.1058	50	1.0148	2549012
0.3297	0.1164	55	1.0050	2798620
0.3171	0.1270	60	1.0040	3056108
0.3139	0.1375	65	1.0020	3317448
0.3386	0.1481	70	0.9962	3574540
0.2501	0.1587	75	0.9942	3832272
0.2482	0.1693	80	0.9906	4083760
0.3098	0.1799	85	0.9875	4336164
0.2415	0.1904	90	0.9910	4592732
0.2895	0.2010	95	0.9856	4844652
0.3474	0.2116	100	0.9849	5098604
0.2472	0.2222	105	0.9823	5355028
0.2587	0.2328	110	0.9795	5604792
0.2691	0.2433	115	0.9779	5860556
0.2396	0.2539	120	0.9761	6117824
0.2505	0.2645	125	0.9776	6368016
0.2609	0.2751	130	0.9765	6626036
0.3553	0.2857	135	0.9746	6883312
0.2906	0.2962	140	0.9750	7139620
0.2989	0.3068	145	0.9738	7392496
0.3201	0.3174	150	0.9707	7646420
0.2327	0.3280	155	0.9708	7896552
0.281	0.3386	160	0.9712	8147848
0.291	0.3491	165	0.9701	8401936
0.3371	0.3597	170	0.9699	8654404
0.1926	0.3703	175	0.9703	8904204
0.286	0.3809	180	0.9703	9158204
0.2423	0.3915	185	0.9669	9411824
0.245	0.4020	190	0.9668	9665800
0.2902	0.4126	195	0.9697	9920256
0.2895	0.4232	200	0.9675	10172112
0.2431	0.4338	205	0.9671	10423820
0.286	0.4444	210	0.9665	10685328
0.3157	0.4549	215	0.9656	10942652
0.225	0.4655	220	0.9658	11199576
0.2655	0.4761	225	0.9654	11458748
0.2338	0.4867	230	0.9646	11713636
0.2768	0.4973	235	0.9647	11970092
0.2008	0.5078	240	0.9649	12215860
0.2491	0.5184	245	0.9634	12470476
0.2654	0.5290	250	0.9622	12726180
0.233	0.5396	255	0.9652	12977864
0.2297	0.5502	260	0.9652	13228988
0.2123	0.5607	265	0.9641	13487916
0.3055	0.5713	270	0.9622	13742944
0.252	0.5819	275	0.9627	14001064
0.2156	0.5925	280	0.9633	14257372
0.2373	0.6031	285	0.9630	14515772
0.2533	0.6136	290	0.9633	14773828
0.3101	0.6242	295	0.9634	15032732
0.2549	0.6348	300	0.9640	15287912
0.2208	0.6454	305	0.9621	15542376
0.2164	0.6560	310	0.9607	15798084
0.1831	0.6665	315	0.9612	16052724
0.2364	0.6771	320	0.9615	16306560
0.2993	0.6877	325	0.9622	16558416
0.2002	0.6983	330	0.9607	16807372
0.1973	0.7089	335	0.9597	17064900
0.3301	0.7194	340	0.9594	17318932
0.319	0.7300	345	0.9598	17577108
0.2533	0.7406	350	0.9579	17826528
0.1998	0.7512	355	0.9565	18081976
0.2274	0.7618	360	0.9560	18343304
0.2253	0.7723	365	0.9567	18596660
0.2473	0.7829	370	0.9566	18850916
0.2654	0.7935	375	0.9565	19111356
0.2053	0.8041	380	0.9557	19366576
0.2462	0.8147	385	0.9549	19620052
0.2217	0.8252	390	0.9573	19876608
0.23	0.8358	395	0.9586	20129892
0.2487	0.8464	400	0.9563	20384164
0.1914	0.8570	405	0.9562	20634768
0.2452	0.8676	410	0.9581	20891764
0.1935	0.8781	415	0.9572	21139492
0.3047	0.8887	420	0.9544	21396976
0.2257	0.8993	425	0.9555	21644644
0.2405	0.9099	430	0.9558	21891880
0.2522	0.9205	435	0.9547	22153024
0.2481	0.9310	440	0.9527	22404200
0.2242	0.9416	445	0.9527	22657000
0.3352	0.9522	450	0.9527	22911872
0.1884	0.9628	455	0.9540	23163792
0.2011	0.9734	460	0.9537	23423212
0.1947	0.9839	465	0.9525	23677144
0.29	0.9945	470	0.9534	23929996

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2

Evaluation results