gpt-neo-125m-finetuned-philosopher_rave_100

This model is a fine-tuned version of EleutherAI/gpt-neo-125m on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.3681

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100.0

Training results

Training Loss	Epoch	Step	Validation Loss
No log	1.0	155	2.6967
No log	2.0	310	2.6846
No log	3.0	465	2.6733
2.6891	4.0	620	2.6626
2.6891	5.0	775	2.6524
2.6891	6.0	930	2.6427
2.6569	7.0	1085	2.6336
2.6569	8.0	1240	2.6248
2.6569	9.0	1395	2.6164
2.6215	10.0	1550	2.6083
2.6215	11.0	1705	2.6005
2.6215	12.0	1860	2.5931
2.6022	13.0	2015	2.5858
2.6022	14.0	2170	2.5789
2.6022	15.0	2325	2.5721
2.6022	16.0	2480	2.5657
2.5777	17.0	2635	2.5594
2.5777	18.0	2790	2.5532
2.5777	19.0	2945	2.5473
2.5548	20.0	3100	2.5416
2.5548	21.0	3255	2.5360
2.5548	22.0	3410	2.5306
2.5359	23.0	3565	2.5253
2.5359	24.0	3720	2.5202
2.5359	25.0	3875	2.5152
2.5248	26.0	4030	2.5103
2.5248	27.0	4185	2.5056
2.5248	28.0	4340	2.5011
2.5248	29.0	4495	2.4966
2.5053	30.0	4650	2.4922
2.5053	31.0	4805	2.4880
2.5053	32.0	4960	2.4839
2.4871	33.0	5115	2.4798
2.4871	34.0	5270	2.4759
2.4871	35.0	5425	2.4721
2.4808	36.0	5580	2.4683
2.4808	37.0	5735	2.4647
2.4808	38.0	5890	2.4612
2.4659	39.0	6045	2.4577
2.4659	40.0	6200	2.4544
2.4659	41.0	6355	2.4511
2.4517	42.0	6510	2.4479
2.4517	43.0	6665	2.4447
2.4517	44.0	6820	2.4417
2.4517	45.0	6975	2.4387
2.4466	46.0	7130	2.4359
2.4466	47.0	7285	2.4330
2.4466	48.0	7440	2.4303
2.4348	49.0	7595	2.4276
2.4348	50.0	7750	2.4250
2.4348	51.0	7905	2.4225
2.4238	52.0	8060	2.4201
2.4238	53.0	8215	2.4177
2.4238	54.0	8370	2.4154
2.4172	55.0	8525	2.4131
2.4172	56.0	8680	2.4109
2.4172	57.0	8835	2.4088
2.4172	58.0	8990	2.4067
2.4097	59.0	9145	2.4047
2.4097	60.0	9300	2.4027
2.4097	61.0	9455	2.4008
2.4054	62.0	9610	2.3990
2.4054	63.0	9765	2.3972
2.4054	64.0	9920	2.3955
2.3936	65.0	10075	2.3938
2.3936	66.0	10230	2.3922
2.3936	67.0	10385	2.3906
2.394	68.0	10540	2.3891
2.394	69.0	10695	2.3877
2.394	70.0	10850	2.3863
2.387	71.0	11005	2.3850
2.387	72.0	11160	2.3837
2.387	73.0	11315	2.3824
2.387	74.0	11470	2.3813
2.3812	75.0	11625	2.3801
2.3812	76.0	11780	2.3791
2.3812	77.0	11935	2.3780
2.3812	78.0	12090	2.3771
2.3812	79.0	12245	2.3762
2.3812	80.0	12400	2.3753
2.3802	81.0	12555	2.3745
2.3802	82.0	12710	2.3737
2.3802	83.0	12865	2.3730
2.3687	84.0	13020	2.3723
2.3687	85.0	13175	2.3717
2.3687	86.0	13330	2.3711
2.3687	87.0	13485	2.3706
2.3722	88.0	13640	2.3702
2.3722	89.0	13795	2.3698
2.3722	90.0	13950	2.3694
2.3693	91.0	14105	2.3691
2.3693	92.0	14260	2.3688
2.3693	93.0	14415	2.3686
2.3654	94.0	14570	2.3684
2.3654	95.0	14725	2.3683
2.3654	96.0	14880	2.3682
2.372	97.0	15035	2.3682
2.372	98.0	15190	2.3681
2.372	99.0	15345	2.3681
2.3664	100.0	15500	2.3681

Framework versions

Transformers 4.39.3
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2

Triangles
/

gpt-neo-125m-finetuned-philosopher_rave_100

gpt-neo-125m-finetuned-philosopher_rave_100

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Triangles/gpt-neo-125m-finetuned-philosopher_rave_100

Evaluation results