long-t5-tglobal-base-synthsumm_direct

Fine-tuned on a synthetic dataset of curated long-context text and GPT-3.5-turbo-1106 summaries spanning multiple domains + "random" long-context examples from pretraining datasets

  • Note: this model has not been fine-tuned on any other summarization datasets, just the synthsumm data

Try it: gradio demo | free HF inference api via requests| .md with example outputs (gauntlet)

Usage

It's recommended to use this model with beam search decoding. If interested, you can also use the textsum util repo to have most of this abstracted out for you:

pip install -U textsum
from textsum.summarize import Summarizer

model_name = "pszemraj/long-t5-tglobal-base-synthsumm_direct"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)

Details

This model is a fine-tuned version of google/long-t5-tglobal-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4378
  • Rouge1: 48.0918
  • Rouge2: 21.2531
  • Rougel: 34.4307
  • Rougelsum: 43.0271
  • Gen Len: 84.5231

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 26605
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.9183 0.38 125 1.5762 38.7221 15.0873 28.3123 34.9655 129.2154
1.8815 0.77 250 1.5230 44.3531 17.9384 31.7417 39.5563 87.3538
1.7264 1.15 375 1.4735 45.7781 20.102 33.329 41.4737 101.9231
1.8545 1.54 500 1.4505 47.0134 20.6159 33.6118 41.6579 88.2308
1.7444 1.92 625 1.4378 48.0918 21.2531 34.4307 43.0271 84.5231

Framework versions

  • Transformers 4.36.0.dev0
  • Pytorch 2.1.0
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
31
Safetensors
Model size
248M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/long-t5-tglobal-base-synthsumm_direct

Finetuned
(19)
this model

Space using pszemraj/long-t5-tglobal-base-synthsumm_direct 1

Collection including pszemraj/long-t5-tglobal-base-synthsumm_direct