distilbart-sum-arxiv

This model is a fine-tuned version of sshleifer/distilbart-xsum-12-6 on a subset of the ccdv/arxiv-summarization dataset. It achieves the following results on the evaluation set:

  • Loss: 2.420
  • Rouge1: 42.185
  • Rouge2: 15.481
  • RougeL: 24.440
  • RougeLSum: 24.260

Model description

This model is a distilled version of BART with 306M parameters (vs. 406 for the BART model), but it is 1.68 times faster than BART at inference. It has been trained on 60_000 samples and has a limitation of 1024 tokens.

Intended uses & limitations

Since this model has been trained on scientific papers, it may perform poorly when attempting to summarize other types of content.

from transformers import pipeline
summarizer = pipeline("text2text-generation", model="AdamCodd/distilbart-sum-arxiv")
paper = "Scientific paper..."
result = summarizer(paper)
print(result)

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 1270
  • optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 150
  • num_epochs: 1

Training results

key value
eval_rouge1 42.185638427734375
eval_rouge2 15.481599807739258
eval_rougeL 24.440900802612305
eval_rougeLsum 24.260608673095703

Framework versions

  • Transformers 4.33.0
  • Pytorch lightning 2.0.8
  • Tokenizers 0.13.3

If you want to support me, you can here.

Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train AdamCodd/distilbart-sum-arxiv

Collection including AdamCodd/distilbart-sum-arxiv

Evaluation results