pretrain_2

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
0.7609	1.0	24286	0.6893
0.7239	2.0	48572	0.6476
0.7056	3.0	72858	0.6279
0.6961	4.0	97144	0.6242
0.6838	5.0	121430	0.6123
0.6742	6.0	145716	0.6111
0.6762	7.0	170002	0.6064
0.6722	8.0	194288	0.6052
0.6603	9.0	218574	0.6043
0.6522	10.0	242860	0.6005
0.654	11.0	267146	0.6022
0.6422	12.0	291432	0.5964
0.6495	13.0	315718	0.5967
0.655	14.0	340004	0.5961
0.651	15.0	364290	0.5925
0.6458	16.0	388576	0.5922
0.6441	17.0	412862	0.5901
0.6477	18.0	437148	0.5871
0.6382	19.0	461434	0.5896
0.6426	20.0	485720	0.5878
0.6369	21.0	510006	0.5873
0.6298	22.0	534292	0.5844
0.6388	23.0	558578	0.5863
0.6389	24.0	582864	0.5826
0.6394	25.0	607150	0.5861
0.6295	26.0	631436	0.5848
0.6365	27.0	655722	0.5815
0.6347	28.0	680008	0.5836
0.6384	29.0	704294	0.5870
0.6381	30.0	728580	0.5816
0.6306	31.0	752866	0.5813
0.6385	32.0	777152	0.5838
0.6338	33.0	801438	0.5808
0.6331	34.0	825724	0.5806
0.6235	35.0	850010	0.5825
0.6329	36.0	874296	0.5825
0.6338	37.0	898582	0.5810
0.6257	38.0	922868	0.5803
0.6268	39.0	947154	0.5810
0.6371	40.0	971440	0.5759
0.6272	41.0	995726	0.5775
0.6276	42.0	1020012	0.5771
0.635	43.0	1044298	0.5757
0.6314	44.0	1068584	0.5753
0.6279	45.0	1092870	0.5760
0.6186	46.0	1117156	0.5756
0.6214	47.0	1141442	0.5763
0.6257	48.0	1165728	0.5776
0.6272	49.0	1190014	0.5746
0.6291	50.0	1214300	0.5734
0.6311	51.0	1238586	0.5715
0.6279	52.0	1262872	0.5776
0.6372	53.0	1287158	0.5725
0.6155	54.0	1311444	0.5782
0.6241	55.0	1335730	0.5748
0.6187	56.0	1360016	0.5716