long-t5-base-govreport

This model is a fine-tuned version of google/long-t5-tglobal-base on the None dataset. It achieves the following results on the evaluation set:

Gen Len: 787.34
Loss: 1.5448
Rouge1: 57.2303
Rouge2: 24.9705
Rougel: 26.8081
Rougelsum: 54.2747

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Refer to the pszemraj/govreport-summarization-8192 dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 3
eval_batch_size: 1
seed: 4299
gradient_accumulation_steps: 128
total_train_batch_size: 384
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 25.0

Training results

Training Loss	Epoch	Step	Gen Len	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
2.1198	0.39	25	805.336	1.8720	29.4332	7.3761	17.0816	25.065
1.8609	0.78	50	833.404	1.7601	35.3533	10.6624	18.643	31.6979
1.7805	1.17	75	866.356	1.6833	36.5786	11.1185	20.0358	33.2116
1.7352	1.56	100	822.348	1.6524	40.5489	13.0695	20.1256	37.1369
1.7371	1.95	125	765.6	1.6294	43.8594	15.2962	20.7807	40.3461
1.6428	2.34	150	844.184	1.6055	44.5054	15.731	21.2582	40.9775
1.6567	2.73	175	857.236	1.6031	47.3641	16.9664	21.4998	43.994
1.5773	3.12	200	841.86	1.5855	47.2284	17.3099	21.6793	43.9018
1.5614	3.52	225	832.8	1.5883	46.4612	17.1368	21.5931	43.1184
1.5328	3.91	250	790.056	1.5730	46.5685	17.5423	22.2082	43.1811
1.5194	4.3	275	825.868	1.5690	47.6205	18.377	22.7639	44.3701
1.571	4.69	300	794.032	1.5676	49.2203	19.1109	22.8005	46.0679
1.4275	5.08	325	833.068	1.5656	50.6982	20.0278	23.5585	47.5036
1.4912	5.47	350	793.068	1.5625	50.3371	19.8639	23.3666	47.1898
1.4764	5.86	375	819.86	1.5532	50.9702	20.7532	23.8765	47.9915
1.3972	6.25	400	770.78	1.5564	49.279	19.4781	23.1018	46.1942
1.4479	6.64	425	806.244	1.5529	50.3317	20.2888	23.4454	47.3491
1.4567	7.03	450	787.48	1.5590	52.2209	21.2868	23.9284	49.1691
1.3933	7.42	475	842.664	1.5561	51.9578	20.5806	23.7177	48.9121
1.4245	7.81	500	813.772	1.5420	52.3725	21.7787	24.5209	49.4003
1.3033	8.2	525	824.66	1.5499	52.7839	21.589	24.5617	49.8609
1.3673	8.59	550	807.348	1.5530	53.2339	22.152	24.7587	50.2502
1.3634	8.98	575	767.952	1.5458	53.0293	22.3194	25.174	50.078
1.3095	9.37	600	856.252	1.5412	53.7658	22.5229	25.0448	50.708
1.3492	9.76	625	826.064	1.5389	51.8662	21.6229	24.6819	48.8648
1.3007	10.16	650	843.544	1.5404	53.6692	22.154	24.6218	50.6864
1.2729	10.55	675	808.764	1.5428	54.6479	23.3029	25.5647	51.6394
1.3758	10.94	700	800.152	1.5403	54.9418	23.3323	25.6087	51.9256
1.3357	11.33	725	814.496	1.5455	55.2511	23.5606	25.8237	52.3183
1.2817	11.72	750	811.144	1.5412	55.2847	23.6632	25.9341	52.3146
1.2771	12.11	775	852.704	1.5450	55.1956	23.5545	25.677	52.1841
1.2892	12.5	800	805.844	1.5369	54.9563	23.5105	25.8876	51.9568
1.2757	12.89	825	813.476	1.5467	56.4728	24.6875	26.4415	53.4939
1.2382	13.28	850	787.34	1.5448	57.2303	24.9705	26.8081	54.2747

Framework versions

Transformers 4.25.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.7.0
Tokenizers 0.13.2

AleBurzio
/

long-t5-base-govreport