collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1048
Num Input Tokens Seen: 30937840

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5329	0.0088	5	1.3818	275656
1.5828	0.0177	10	1.3192	552624
1.428	0.0265	15	1.2548	826024
1.2605	0.0353	20	1.1988	1101872
1.128	0.0441	25	1.1751	1372272
1.1142	0.0530	30	1.1723	1649368
1.0356	0.0618	35	1.1729	1923640
0.8851	0.0706	40	1.1976	2191160
0.8191	0.0794	45	1.2022	2461680
0.7443	0.0883	50	1.2214	2734480
0.6972	0.0971	55	1.2002	3009552
0.6345	0.1059	60	1.1883	3280536
0.6249	0.1148	65	1.1948	3550800
0.5123	0.1236	70	1.1961	3821736
0.4661	0.1324	75	1.1938	4096488
0.491	0.1412	80	1.1864	4365312
0.4568	0.1501	85	1.1824	4633488
0.4108	0.1589	90	1.1803	4908632
0.4298	0.1677	95	1.1849	5184840
0.3863	0.1765	100	1.1807	5459872
0.469	0.1854	105	1.1808	5733384
0.4112	0.1942	110	1.1745	6008360
0.4328	0.2030	115	1.1731	6279928
0.4686	0.2119	120	1.1719	6549560
0.4254	0.2207	125	1.1650	6820064
0.4131	0.2295	130	1.1679	7092784
0.3801	0.2383	135	1.1606	7373688
0.3755	0.2472	140	1.1660	7644816
0.372	0.2560	145	1.1577	7921944
0.376	0.2648	150	1.1549	8196384
0.3551	0.2736	155	1.1556	8467640
0.3409	0.2825	160	1.1497	8746104
0.3663	0.2913	165	1.1528	9022632
0.3558	0.3001	170	1.1509	9298176
0.3088	0.3089	175	1.1524	9575872
0.3736	0.3178	180	1.1473	9841528
0.3377	0.3266	185	1.1481	10116520
0.3473	0.3354	190	1.1428	10394896
0.3137	0.3443	195	1.1425	10665848
0.3163	0.3531	200	1.1433	10939128
0.2973	0.3619	205	1.1410	11212464
0.3062	0.3707	210	1.1446	11480560
0.4125	0.3796	215	1.1397	11759744
0.3505	0.3884	220	1.1423	12033656
0.3489	0.3972	225	1.1403	12304952
0.265	0.4060	230	1.1346	12573520
0.2683	0.4149	235	1.1399	12844072
0.2863	0.4237	240	1.1370	13114088
0.2612	0.4325	245	1.1384	13391416
0.3089	0.4414	250	1.1348	13665888
0.2451	0.4502	255	1.1337	13934272
0.3628	0.4590	260	1.1334	14210656
0.3143	0.4678	265	1.1321	14481944
0.2468	0.4767	270	1.1317	14748960
0.3403	0.4855	275	1.1282	15025096
0.3069	0.4943	280	1.1276	15294856
0.3461	0.5031	285	1.1277	15568080
0.2733	0.5120	290	1.1283	15837368
0.3364	0.5208	295	1.1265	16106872
0.3107	0.5296	300	1.1228	16382760
0.2594	0.5385	305	1.1277	16651328
0.3674	0.5473	310	1.1237	16921656
0.2966	0.5561	315	1.1227	17201416
0.2795	0.5649	320	1.1247	17480400
0.3032	0.5738	325	1.1228	17754296
0.268	0.5826	330	1.1208	18024456
0.2329	0.5914	335	1.1225	18296232
0.293	0.6002	340	1.1196	18568008
0.2789	0.6091	345	1.1186	18842272
0.3291	0.6179	350	1.1215	19118304
0.3131	0.6267	355	1.1179	19396528
0.2905	0.6356	360	1.1180	19667944
0.3705	0.6444	365	1.1168	19942280
0.3211	0.6532	370	1.1155	20213920
0.3426	0.6620	375	1.1159	20488320
0.2674	0.6709	380	1.1158	20761120
0.2985	0.6797	385	1.1161	21039632
0.2743	0.6885	390	1.1135	21308888
0.2949	0.6973	395	1.1175	21583640
0.2632	0.7062	400	1.1148	21861936
0.3536	0.7150	405	1.1137	22141144
0.3069	0.7238	410	1.1147	22415856
0.2709	0.7326	415	1.1140	22687328
0.2526	0.7415	420	1.1131	22960104
0.2865	0.7503	425	1.1115	23234496
0.4072	0.7591	430	1.1117	23501504
0.3175	0.7680	435	1.1102	23773256
0.2798	0.7768	440	1.1101	24046768
0.3312	0.7856	445	1.1094	24323872
0.3448	0.7944	450	1.1098	24602456
0.2342	0.8033	455	1.1093	24873976
0.3352	0.8121	460	1.1091	25142944
0.2058	0.8209	465	1.1062	25422584
0.3473	0.8297	470	1.1066	25702288
0.3227	0.8386	475	1.1085	25972656
0.2548	0.8474	480	1.1072	26241896
0.2785	0.8562	485	1.1055	26513264
0.3941	0.8651	490	1.1053	26788024
0.2188	0.8739	495	1.1053	27060584
0.2283	0.8827	500	1.1057	27330952
0.316	0.8915	505	1.1054	27602456
0.2504	0.9004	510	1.1046	27873592
0.3032	0.9092	515	1.1029	28149760
0.3913	0.9180	520	1.1042	28429672
0.3072	0.9268	525	1.1044	28700704
0.2355	0.9357	530	1.1026	28972856
0.2685	0.9445	535	1.1023	29244952
0.2743	0.9533	540	1.1032	29521872
0.2402	0.9622	545	1.1006	29798312
0.263	0.9710	550	1.1012	30071680
0.3205	0.9798	555	1.1012	30341752
0.2768	0.9886	560	1.1006	30611208
0.3064	0.9975	565	1.1042	30884496

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd2

Evaluation results