pszemraj
/

long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP

text2text-generation

Model card Files Files and versions

long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP

NOTE: this is still a work-in-progress (WIP) and not completed/converged by any means, but sharing to maybe save some time for others :)

Updates

As I update this WIP checkpoint, I will post a note here.

July 26, 2022: add two more epochs of training, metrics starting to be almost as good as the more-tuned base variant
July 8, 2022: add checkpoint with ~4 epochs of training on A100, equating to approx 350 steps of functional batch size 128
July 4, 2022: add checkpoint with six additional epochs of training with the dataset summary outputs filtered to 1024 tokens, resolving the prior issue of short summaries.

About

a checkpoint of Stancld/longt5-tglobal-large-16384-pubmed-3k_steps trained on kmfoda/booksum for about 26 epochs
max input lengths during training vary between 8192 and 16384 tokens depending on GPU availability. This checkpoint was trained with 16384 tokens as the max input length for the final 10+ epochs

Comparisons

compare to pszemraj/led-large-book-summary.
- inference API has been disabled because it's too compute-intensive :/

Downloads last month: 3

Safetensors

Model size

783M params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train pszemraj/long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP

Evaluation results

ROUGE-1 on kmfoda/booksum
test set verified

35.997
ROUGE-2 on kmfoda/booksum
test set verified

5.927
ROUGE-L on kmfoda/booksum
test set verified

16.014
ROUGE-LSUM on kmfoda/booksum
test set verified

32.941
loss on kmfoda/booksum
test set verified

2.934
gen_len on kmfoda/booksum
test set verified

283.720
ROUGE-1 on samsum
test set verified

26.241
ROUGE-2 on samsum
test set verified

5.979
ROUGE-L on samsum
test set verified

18.747
ROUGE-LSUM on samsum
test set verified

22.557

View on Papers With Code