jordiclive
/

flan-t5-3b-summarizer

text2text-generation

document summary

text-generation-inference

Model card Files Files and versions Community

jordiclive commited on Feb 5, 2023

Commit

b179c47

·

1 Parent(s): 905c83b

Update README.md

Files changed (1) hide show

README.md +14 -46

README.md CHANGED Viewed

@@ -134,7 +134,7 @@ inference:
     num_beams: 4
 ---
-# Multi-purpose Summarizer (Fine-tuned google/flan-t5-xl (3B) on several Summarization datasets)
 <a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
@@ -202,60 +202,28 @@ If having computing constraints, try the base version [`pszemraj/led-base-book-s
 ## Training procedure
-- Training completed on the BookSum dataset for 13 total epochs
-- **The final four epochs combined the training and validation sets as 'train' in an effort to increase generalization.**
 ### Training hyperparameters
-#### Initial Three Epochs
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 1
-- eval_batch_size: 1
 - seed: 42
 - distributed_type: multi-GPU
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 4
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 3
-#### In-between Epochs
-Unfortunately, don't have all records on-hand for middle epochs; the following should be representative:
-- learning_rate: 4e-05
-- train_batch_size: 2
-- eval_batch_size: 2
-- seed: 42
-- distributed_type: multi-GPU
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.05
-- num_epochs: 6 (in addition to prior model)
-#### Final Two Epochs
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 1
-- eval_batch_size: 1
-- seed: 42
-- distributed_type: multi-GPU
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 16
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.03
-- num_epochs: 2 (in addition to prior model)
 ### Framework versions
-- Transformers 4.19.2
-- Pytorch 1.11.0+cu113
-- Datasets 2.2.2
-- Tokenizers 0.12.1

     num_beams: 4
 ---
+# Multi-purpose Summarizer (Fine-tuned 3B google/flan-t5-xl on several Summarization datasets)
 <a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 ## Training procedure
+- Training was done in BF16, deepspeed stage 2 for 6 epochs with ROUGE2 monitored on the validation set.
+-
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 3e-05
+- train_batch_size: 5
+- eval_batch_size: 8
 - seed: 42
 - distributed_type: multi-GPU
+- gradient_accumulation_steps: 2
+- effective_train_batch_size: 80
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- warmup_steps: 2000
+- num_epochs: 10
 ### Framework versions
+- Transformers 4.24.0
+- Pytorch 1.9.1+cu111
+- Deepspeed 0.7.4
+- Pytorch-lightning 1.8.1