gpt-neo-125M_menuitemexp

This model is a fine-tuned version of EleutherAI/gpt-neo-125M on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 25
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
9.1319	0.4918	30	7.7822
7.0116	0.9836	60	6.2200
5.5238	1.4754	90	4.9230
4.2988	1.9672	120	3.8166
3.296	2.4590	150	2.9837
2.5326	2.9508	180	2.2714
1.8979	3.4426	210	1.8421
1.6111	3.9344	240	1.5914
1.3322	4.4262	270	1.4063
1.1786	4.9180	300	1.2800
1.0535	5.4098	330	1.1787
0.9352	5.9016	360	1.1194
0.8669	6.3934	390	1.0640
0.8312	6.8852	420	1.0327
0.7797	7.3770	450	1.0137
0.7653	7.8689	480	0.9842
0.7149	8.3607	510	0.9717
0.7059	8.8525	540	0.9627
0.6857	9.3443	570	0.9478
0.6648	9.8361	600	0.9424
0.654	10.3279	630	0.9343
0.6452	10.8197	660	0.9258
0.6032	11.3115	690	0.9343
0.6174	11.8033	720	0.9123
0.5936	12.2951	750	0.9071
0.5865	12.7869	780	0.9011
0.5975	13.2787	810	0.8992
0.5714	13.7705	840	0.8958
0.5533	14.2623	870	0.8996
0.5508	14.7541	900	0.8985
0.5496	15.2459	930	0.8930
0.5389	15.7377	960	0.8943
0.5453	16.2295	990	0.8915
0.5355	16.7213	1020	0.8863
0.5271	17.2131	1050	0.8894
0.5276	17.7049	1080	0.8884
0.5131	18.1967	1110	0.8891
0.513	18.6885	1140	0.8860
0.5075	19.1803	1170	0.8866
0.5131	19.6721	1200	0.8848
0.5022	20.1639	1230	0.8851
0.5116	20.6557	1260	0.8854
0.5015	21.1475	1290	0.8851
0.5063	21.6393	1320	0.8844
0.5064	22.1311	1350	0.8844
0.4869	22.6230	1380	0.8845
0.5047	23.1148	1410	0.8849
0.5027	23.6066	1440	0.8846
0.4911	24.0984	1470	0.8845
0.5007	24.5902	1500	0.8844