collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0917
Num Input Tokens Seen: 20879032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5067	0.0133	5	1.3748	273504
1.4081	0.0265	10	1.2830	545936
1.3549	0.0398	15	1.2063	825832
1.1392	0.0531	20	1.1670	1098248
1.0831	0.0663	25	1.1608	1374984
1.0367	0.0796	30	1.1570	1651888
0.8575	0.0929	35	1.1675	1925768
0.8087	0.1061	40	1.1749	2202208
0.8424	0.1194	45	1.1664	2483168
0.7382	0.1326	50	1.1891	2759744
0.6546	0.1459	55	1.1755	3041192
0.6009	0.1592	60	1.1720	3312936
0.65	0.1724	65	1.1656	3584968
0.513	0.1857	70	1.1583	3859624
0.5095	0.1990	75	1.1543	4142640
0.3864	0.2122	80	1.1539	4420432
0.5413	0.2255	85	1.1485	4695096
0.4637	0.2388	90	1.1475	4975288
0.5035	0.2520	95	1.1442	5257360
0.5414	0.2653	100	1.1402	5532816
0.5575	0.2786	105	1.1374	5814560
0.4326	0.2918	110	1.1351	6089288
0.3607	0.3051	115	1.1324	6365672
0.4166	0.3184	120	1.1335	6646232
0.4479	0.3316	125	1.1293	6920344
0.5133	0.3449	130	1.1298	7203792
0.3867	0.3581	135	1.1239	7492288
0.4439	0.3714	140	1.1289	7773672
0.4353	0.3847	145	1.1216	8058928
0.4172	0.3979	150	1.1244	8336984
0.3993	0.4112	155	1.1174	8618288
0.4248	0.4245	160	1.1228	8897568
0.4136	0.4377	165	1.1207	9173528
0.4214	0.4510	170	1.1164	9448368
0.4742	0.4643	175	1.1196	9724824
0.3857	0.4775	180	1.1164	10003152
0.3429	0.4908	185	1.1163	10283192
0.4161	0.5041	190	1.1166	10566080
0.4795	0.5173	195	1.1120	10841592
0.3835	0.5306	200	1.1116	11127760
0.3267	0.5439	205	1.1113	11398928
0.4626	0.5571	210	1.1093	11678328
0.3755	0.5704	215	1.1088	11958848
0.3646	0.5837	220	1.1075	12231888
0.4435	0.5969	225	1.1079	12502048
0.4098	0.6102	230	1.1070	12781520
0.3391	0.6234	235	1.1051	13060160
0.3454	0.6367	240	1.1053	13332944
0.4199	0.6500	245	1.1058	13607448
0.462	0.6632	250	1.1019	13886208
0.3375	0.6765	255	1.1056	14167776
0.3267	0.6898	260	1.1020	14443704
0.3554	0.7030	265	1.1009	14719960
0.3085	0.7163	270	1.1033	14994048
0.4255	0.7296	275	1.0997	15265616
0.4229	0.7428	280	1.1005	15537584
0.4453	0.7561	285	1.1007	15815648
0.2962	0.7694	290	1.0979	16092416
0.3443	0.7826	295	1.0984	16373224
0.3969	0.7959	300	1.0989	16651856
0.3985	0.8092	305	1.0961	16930456
0.3441	0.8224	310	1.0960	17210416
0.4218	0.8357	315	1.0953	17492744
0.3387	0.8489	320	1.0962	17770488
0.4258	0.8622	325	1.0944	18041344
0.3612	0.8755	330	1.0946	18322400
0.3957	0.8887	335	1.0933	18602800
0.414	0.9020	340	1.0927	18878536
0.4075	0.9153	345	1.0920	19160392
0.3823	0.9285	350	1.0920	19441032
0.3939	0.9418	355	1.0913	19714912
0.3477	0.9551	360	1.0919	19991552
0.3052	0.9683	365	1.0912	20265240
0.3146	0.9816	370	1.0887	20542976
0.3115	0.9949	375	1.0915	20823576

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

Evaluation results