wikipedia_conv

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
5.7941	0.0175	1000	5.5612
5.2657	0.0350	2000	5.1503
5.0438	0.0526	3000	4.9584
4.9125	0.0701	4000	4.8251
4.8024	0.0876	5000	4.7222
4.7245	0.1051	6000	4.6339
4.6491	0.1226	7000	4.5608
4.5966	0.1401	8000	4.5026
4.5466	0.1577	9000	4.4498
4.5008	0.1752	10000	4.4027
4.4624	0.1927	11000	4.3679
4.4255	0.2102	12000	4.3319
4.4001	0.2277	13000	4.3000
4.373	0.2453	14000	4.2727
4.3503	0.2628	15000	4.2483
4.3254	0.2803	16000	4.2283
4.2975	0.2978	17000	4.2071
4.2917	0.3153	18000	4.1871
4.2657	0.3329	19000	4.1669
4.2558	0.3504	20000	4.1560
4.2321	0.3679	21000	4.1401
4.2249	0.3854	22000	4.1265
4.2113	0.4029	23000	4.1118
4.1946	0.4204	24000	4.0979
4.1946	0.4380	25000	4.0872
4.1766	0.4555	26000	4.0777
4.169	0.4730	27000	4.0686
4.1504	0.4905	28000	4.0575
4.1495	0.5080	29000	4.0473
4.137	0.5256	30000	4.0410
4.1313	0.5431	31000	4.0332
4.1195	0.5606	32000	4.0254
4.1087	0.5781	33000	4.0167
4.1138	0.5956	34000	4.0113
4.0945	0.6132	35000	4.0041
4.096	0.6307	36000	3.9989
4.0764	0.6482	37000	3.9927
4.0872	0.6657	38000	3.9898
4.0803	0.6832	39000	3.9823
4.0741	0.7007	40000	3.9754
4.0679	0.7183	41000	3.9722
4.0606	0.7358	42000	3.9702
4.062	0.7533	43000	3.9622
4.0412	0.7708	44000	3.9598
4.0503	0.7883	45000	3.9542
4.039	0.8059	46000	3.9550
4.0325	0.8234	47000	3.9446
4.0396	0.8409	48000	3.9425
4.0289	0.8584	49000	3.9371
4.0372	0.8759	50000	3.9370
4.0205	0.8935	51000	3.9345
4.0238	0.9110	52000	3.9304
4.0112	0.9285	53000	3.9281
4.0153	0.9460	54000	3.9233
4.0048	0.9635	55000	3.9192
4.0031	0.9810	56000	3.9197
4.0114	0.9986	57000	3.9145