pszemraj
/

long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP

text2text-generation

Model card Files Files and versions Community

pszemraj commited on Jul 8, 2022

Commit

dbcf4ab

·

1 Parent(s): 2173531

describe latest checkpoint

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -56,7 +56,7 @@ parameters:
   min_length: 8
   no_repeat_ngram_size: 3
   early_stopping: True
-  repetition_penalty: 3.5
   length_penalty: 0.3
   encoder_no_repeat_ngram_size : 3
   num_beams : 4
@@ -72,12 +72,13 @@ parameters:
 _As I make updates to this WIP checkpoint I will post a note here._
 - July 4, 2022: add checkpoint with 6 additional epochs of training with the dataset summary outputs filtered to 1024 **tokens**, resolving prior issue of short summaries.
 ## About
 - a checkpoint of [Stancld/longt5-tglobal-large-16384-pubmed-3k_steps](https://huggingface.co/Stancld/longt5-tglobal-large-16384-pubmed-3k_steps) trained on `kmfoda/booksum` for about 26 epochs
-- max input lengths during training vary between 8192 and 16384 tokens depending on GPU availability. This checkpoint was **trained with 16384 tokens as the max input length for the final eight epochs**
  ## Comparisons

   min_length: 8
   no_repeat_ngram_size: 3
   early_stopping: True
+  repetition_penalty: 1.5
   length_penalty: 0.3
   encoder_no_repeat_ngram_size : 3
   num_beams : 4
 _As I make updates to this WIP checkpoint I will post a note here._
+- July 8, 2022: add checkpoint with ~4 epochs of training on A100, equating to approx 350 steps of functional batch size 128
 - July 4, 2022: add checkpoint with 6 additional epochs of training with the dataset summary outputs filtered to 1024 **tokens**, resolving prior issue of short summaries.
 ## About
 - a checkpoint of [Stancld/longt5-tglobal-large-16384-pubmed-3k_steps](https://huggingface.co/Stancld/longt5-tglobal-large-16384-pubmed-3k_steps) trained on `kmfoda/booksum` for about 26 epochs
+- max input lengths during training vary between 8192 and 16384 tokens depending on GPU availability. This checkpoint was **trained with 16384 tokens as the max input length for the final 10+ epochs**
  ## Comparisons