collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9550
Num Input Tokens Seen: 24347300

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.4207	0.0104	5	1.1924	257832
1.2547	0.0208	10	1.0837	513264
1.048	0.0312	15	1.0324	766088
0.851	0.0416	20	1.0155	1020968
0.7142	0.0520	25	1.0140	1278292
0.5167	0.0624	30	1.0245	1526988
0.3847	0.0728	35	1.0231	1784656
0.3415	0.0832	40	1.0195	2041944
0.3307	0.0936	45	1.0139	2297136
0.3229	0.1040	50	1.0051	2548836
0.2929	0.1144	55	1.0047	2801308
0.3572	0.1247	60	1.0015	3059268
0.2615	0.1351	65	0.9968	3313844
0.3591	0.1455	70	0.9979	3562524
0.2775	0.1559	75	0.9933	3813084
0.2585	0.1663	80	0.9926	4066920
0.2725	0.1767	85	0.9883	4324928
0.2611	0.1871	90	0.9864	4580444
0.353	0.1975	95	0.9846	4830100
0.2454	0.2079	100	0.9834	5075616
0.2581	0.2183	105	0.9827	5325788
0.2402	0.2287	110	0.9801	5572856
0.2896	0.2391	115	0.9798	5833092
0.2406	0.2495	120	0.9787	6088344
0.2592	0.2599	125	0.9775	6341032
0.2952	0.2703	130	0.9766	6602212
0.2539	0.2807	135	0.9747	6855168
0.2138	0.2911	140	0.9729	7107748
0.2758	0.3015	145	0.9749	7362420
0.2321	0.3119	150	0.9724	7615056
0.1961	0.3223	155	0.9723	7871808
0.2426	0.3327	160	0.9729	8125796
0.3097	0.3431	165	0.9724	8379432
0.1816	0.3535	170	0.9702	8630796
0.2926	0.3638	175	0.9708	8877972
0.2079	0.3742	180	0.9716	9125056
0.2498	0.3846	185	0.9712	9374900
0.2174	0.3950	190	0.9698	9627612
0.2393	0.4054	195	0.9707	9875172
0.2249	0.4158	200	0.9702	10126424
0.2005	0.4262	205	0.9681	10377444
0.2247	0.4366	210	0.9671	10631588
0.2091	0.4470	215	0.9659	10892176
0.2034	0.4574	220	0.9664	11142312
0.2397	0.4678	225	0.9670	11402760
0.2517	0.4782	230	0.9644	11657056
0.3291	0.4886	235	0.9645	11902520
0.2911	0.4990	240	0.9631	12151572
0.2172	0.5094	245	0.9614	12414324
0.2315	0.5198	250	0.9618	12672860
0.2185	0.5302	255	0.9623	12927456
0.2485	0.5406	260	0.9601	13184492
0.3055	0.5510	265	0.9610	13440432
0.2073	0.5614	270	0.9615	13691400
0.3036	0.5718	275	0.9600	13940060
0.2752	0.5822	280	0.9594	14195336
0.2654	0.5926	285	0.9597	14447088
0.3343	0.6029	290	0.9593	14706880
0.3417	0.6133	295	0.9601	14964384
0.2027	0.6237	300	0.9586	15212096
0.2576	0.6341	305	0.9574	15460956
0.2259	0.6445	310	0.9583	15719636
0.245	0.6549	315	0.9566	15980620
0.2193	0.6653	320	0.9582	16237904
0.2397	0.6757	325	0.9595	16490244
0.2264	0.6861	330	0.9567	16749444
0.2564	0.6965	335	0.9565	16996912
0.2242	0.7069	340	0.9561	17255000
0.2263	0.7173	345	0.9544	17508552
0.2417	0.7277	350	0.9554	17761116
0.2355	0.7381	355	0.9538	18019056
0.2344	0.7485	360	0.9538	18273916
0.2404	0.7589	365	0.9565	18524148
0.1552	0.7693	370	0.9577	18777776
0.2278	0.7797	375	0.9569	19028256
0.2164	0.7901	380	0.9555	19288972
0.1864	0.8005	385	0.9569	19539736
0.2767	0.8109	390	0.9572	19789388
0.2737	0.8213	395	0.9565	20034884
0.2266	0.8317	400	0.9566	20285948
0.2633	0.8421	405	0.9586	20534344
0.1812	0.8524	410	0.9545	20788364
0.2365	0.8628	415	0.9527	21043348
0.2148	0.8732	420	0.9536	21296084
0.2508	0.8836	425	0.9556	21551736
0.2298	0.8940	430	0.9553	21803092
0.2442	0.9044	435	0.9564	22059360
0.2786	0.9148	440	0.9550	22314208
0.2686	0.9252	445	0.9546	22566492
0.2733	0.9356	450	0.9567	22819036
0.2783	0.9460	455	0.9570	23073228
0.2188	0.9564	460	0.9528	23324836
0.275	0.9668	465	0.9525	23582856
0.2923	0.9772	470	0.9532	23837532
0.2072	0.9876	475	0.9537	24088876
0.2018	0.9980	480	0.9550	24347300

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd0

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd0

Evaluation results