long-t5-base-govreport

This model is a fine-tuned version of google/long-t5-tglobal-base on the None dataset. It achieves the following results on the evaluation set:

  • Gen Len: 787.34
  • Loss: 1.5448
  • Rouge1: 57.2303
  • Rouge2: 24.9705
  • Rougel: 26.8081
  • Rougelsum: 54.2747

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Refer to the pszemraj/govreport-summarization-8192 dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 3
  • eval_batch_size: 1
  • seed: 4299
  • gradient_accumulation_steps: 128
  • total_train_batch_size: 384
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 25.0

Training results

Training Loss Epoch Step Gen Len Validation Loss Rouge1 Rouge2 Rougel Rougelsum
2.1198 0.39 25 805.336 1.8720 29.4332 7.3761 17.0816 25.065
1.8609 0.78 50 833.404 1.7601 35.3533 10.6624 18.643 31.6979
1.7805 1.17 75 866.356 1.6833 36.5786 11.1185 20.0358 33.2116
1.7352 1.56 100 822.348 1.6524 40.5489 13.0695 20.1256 37.1369
1.7371 1.95 125 765.6 1.6294 43.8594 15.2962 20.7807 40.3461
1.6428 2.34 150 844.184 1.6055 44.5054 15.731 21.2582 40.9775
1.6567 2.73 175 857.236 1.6031 47.3641 16.9664 21.4998 43.994
1.5773 3.12 200 841.86 1.5855 47.2284 17.3099 21.6793 43.9018
1.5614 3.52 225 832.8 1.5883 46.4612 17.1368 21.5931 43.1184
1.5328 3.91 250 790.056 1.5730 46.5685 17.5423 22.2082 43.1811
1.5194 4.3 275 825.868 1.5690 47.6205 18.377 22.7639 44.3701
1.571 4.69 300 794.032 1.5676 49.2203 19.1109 22.8005 46.0679
1.4275 5.08 325 833.068 1.5656 50.6982 20.0278 23.5585 47.5036
1.4912 5.47 350 793.068 1.5625 50.3371 19.8639 23.3666 47.1898
1.4764 5.86 375 819.86 1.5532 50.9702 20.7532 23.8765 47.9915
1.3972 6.25 400 770.78 1.5564 49.279 19.4781 23.1018 46.1942
1.4479 6.64 425 806.244 1.5529 50.3317 20.2888 23.4454 47.3491
1.4567 7.03 450 787.48 1.5590 52.2209 21.2868 23.9284 49.1691
1.3933 7.42 475 842.664 1.5561 51.9578 20.5806 23.7177 48.9121
1.4245 7.81 500 813.772 1.5420 52.3725 21.7787 24.5209 49.4003
1.3033 8.2 525 824.66 1.5499 52.7839 21.589 24.5617 49.8609
1.3673 8.59 550 807.348 1.5530 53.2339 22.152 24.7587 50.2502
1.3634 8.98 575 767.952 1.5458 53.0293 22.3194 25.174 50.078
1.3095 9.37 600 856.252 1.5412 53.7658 22.5229 25.0448 50.708
1.3492 9.76 625 826.064 1.5389 51.8662 21.6229 24.6819 48.8648
1.3007 10.16 650 843.544 1.5404 53.6692 22.154 24.6218 50.6864
1.2729 10.55 675 808.764 1.5428 54.6479 23.3029 25.5647 51.6394
1.3758 10.94 700 800.152 1.5403 54.9418 23.3323 25.6087 51.9256
1.3357 11.33 725 814.496 1.5455 55.2511 23.5606 25.8237 52.3183
1.2817 11.72 750 811.144 1.5412 55.2847 23.6632 25.9341 52.3146
1.2771 12.11 775 852.704 1.5450 55.1956 23.5545 25.677 52.1841
1.2892 12.5 800 805.844 1.5369 54.9563 23.5105 25.8876 51.9568
1.2757 12.89 825 813.476 1.5467 56.4728 24.6875 26.4415 53.4939
1.2382 13.28 850 787.34 1.5448 57.2303 24.9705 26.8081 54.2747

Framework versions

  • Transformers 4.25.0.dev0
  • Pytorch 1.13.0+cu117
  • Datasets 2.7.0
  • Tokenizers 0.13.2
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train AleBurzio/long-t5-base-govreport