Transformers >= 4.36.1
This model relies on a custom modeling file, you need to add trust_remote_code=True
See #13467

LSG ArXiv paper.
Github/conversion script is available at this link.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-16384-mediasum", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-16384-mediasum", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )

ccdv/lsg-bart-base-16384-mediasum

This model is a fine-tuned version of ccdv/lsg-bart-base-4096-mediasum on the ccdv/mediasum roberta_prepended mediasum dataset.
The model is converted to handle 16384 long sequences and fine-tuned accordingly during 1 epoch.
It achieves the following results on the test set:

Length Global tokens Fine-tuning Block Size Sparsity Connexions R1 R2 RL RLsum
16384 64 Full 256 0 768 35.31 18.35 31.81 32.47
16384 1 Full 256 0 768 35.21 18.20 31.73 32.37
16384 64 Global only 256 0 768 35.22 18.08 31.54 32.21
16384 1 None 256 0 768 35.17 18.13 31.54 32.20

Reference model:

Length Global tokens Fine-tuning Block Size Sparsity Connexions R1 R2 RL RLsum
4096 1 - 256 0 768 35.16 18.13 31.54 32.20

Model description

The model relies on Local-Sparse-Global attention to handle long sequences: attn

The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers).
The model is warm started from ccdv/lsg-bart-base-4096-mediasum, converted to handle long sequences (encoder only) and fine tuned.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Generate hyperparameters

The following hyperparameters were used during generation:

  • dataset_name: ccdv/mediasum
  • dataset_config_name: roberta_prepended
  • eval_batch_size: 8
  • eval_samples: 10000
  • early_stopping: True
  • ignore_pad_token_for_loss: True
  • length_penalty: 2.0
  • max_length: 128
  • min_length: 3
  • num_beams: 5
  • no_repeat_ngram_size: None
  • seed: 123

Framework versions

  • Transformers 4.18.0
  • Pytorch 1.10.1+cu102
  • Datasets 2.1.0
  • Tokenizers 0.11.6
Downloads last month
46
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Model tree for ccdv/lsg-bart-base-16384-mediasum

Finetunes
1 model

Dataset used to train ccdv/lsg-bart-base-16384-mediasum