tfa_output_2025_m02_d02_t23h_29m_10s

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.5286

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0	0	3.1242
3.6266	0.6667	1	3.1242
1.7421	1.0	2	3.1070
3.5691	1.6667	3	3.0863
1.8145	2.0	4	3.0757
3.5803	2.6667	5	3.0580
1.7267	3.0	6	3.0413
3.5001	3.6667	7	3.0321
1.756	4.0	8	3.0203
3.5043	4.6667	9	3.0064
1.6752	5.0	10	2.9966
3.449	5.6667	11	2.9887
1.7513	6.0	12	2.9786
3.4134	6.6667	13	2.9696
1.7109	7.0	14	2.9604
3.4382	7.6667	15	2.9518
1.6749	8.0	16	2.9431
3.3344	8.6667	17	2.9329
1.7577	9.0	18	2.9260
3.3458	9.6667	19	2.9184
1.6737	10.0	20	2.9092
3.4422	10.6667	21	2.8993
1.5953	11.0	22	2.8892
3.3445	11.6667	23	2.8819
1.631	12.0	24	2.8721
3.3118	12.6667	25	2.8626
1.6067	13.0	26	2.8547
3.2708	13.6667	27	2.8468
1.619	14.0	28	2.8393
3.3062	14.6667	29	2.8312
1.6059	15.0	30	2.8240
3.2822	15.6667	31	2.8154
1.607	16.0	32	2.8071
3.2488	16.6667	33	2.7995
1.6078	17.0	34	2.7929
3.2022	17.6667	35	2.7875
1.5858	18.0	36	2.7801
3.1965	18.6667	37	2.7720
1.5955	19.0	38	2.7656
3.1891	19.6667	39	2.7606
1.5755	20.0	40	2.7527
3.0662	20.6667	41	2.7467
1.6257	21.0	42	2.7411
3.1364	21.6667	43	2.7350
1.5211	22.0	44	2.7288
3.157	22.6667	45	2.7235
1.4631	23.0	46	2.7158
3.1188	23.6667	47	2.7099
1.4971	24.0	48	2.7063
2.98	24.6667	49	2.7008
1.634	25.0	50	2.6942
3.016	25.6667	51	2.6879
1.5771	26.0	52	2.6843
3.0495	26.6667	53	2.6808
1.4922	27.0	54	2.6750
2.9655	27.6667	55	2.6711
1.6188	28.0	56	2.6655
3.0155	28.6667	57	2.6611
1.4867	29.0	58	2.6567
3.0117	29.6667	59	2.6515
1.5069	30.0	60	2.6470
3.0118	30.6667	61	2.6441
1.4577	31.0	62	2.6375
3.0372	31.6667	63	2.6350
1.411	32.0	64	2.6295
2.9611	32.6667	65	2.6267
1.4289	33.0	66	2.6246
2.9595	33.6667	67	2.6207
1.437	34.0	68	2.6166
2.9483	34.6667	69	2.6127
1.4469	35.0	70	2.6114
2.9291	35.6667	71	2.6067
1.411	36.0	72	2.6021
2.9534	36.6667	73	2.5988
1.4295	37.0	74	2.5958
2.9181	37.6667	75	2.5929
1.4138	38.0	76	2.5891
2.9133	38.6667	77	2.5855
1.4172	39.0	78	2.5818
2.8655	39.6667	79	2.5809
1.3988	40.0	80	2.5780
2.929	40.6667	81	2.5750
1.3445	41.0	82	2.5712
2.8141	41.6667	83	2.5696
1.503	42.0	84	2.5668
2.8483	42.6667	85	2.5636
1.4017	43.0	86	2.5622
2.8643	43.6667	87	2.5575
1.3592	44.0	88	2.5553
2.8332	44.6667	89	2.5537
1.3675	45.0	90	2.5503
2.742	45.6667	91	2.5478
1.5006	46.0	92	2.5453
2.7909	46.6667	93	2.5436
1.4314	47.0	94	2.5406
2.7937	47.6667	95	2.5382
1.3617	48.0	96	2.5359
2.8299	48.6667	97	2.5343
1.3295	49.0	98	2.5306
2.7586	49.6667	99	2.5297
1.4496	50.0	100	2.5286

Framework versions

Transformers 4.48.0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

brando
/

tfa_output_2025_m02_d02_t23h_29m_10s

tfa_output_2025_m02_d02_t23h_29m_10s

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for brando/tfa_output_2025_m02_d02_t23h_29m_10s

Evaluation results