jordiclive commited on
Commit
b179c47
1 Parent(s): 905c83b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -46
README.md CHANGED
@@ -134,7 +134,7 @@ inference:
134
  num_beams: 4
135
  ---
136
 
137
- # Multi-purpose Summarizer (Fine-tuned google/flan-t5-xl (3B) on several Summarization datasets)
138
 
139
  <a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
140
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
@@ -202,60 +202,28 @@ If having computing constraints, try the base version [`pszemraj/led-base-book-s
202
 
203
  ## Training procedure
204
 
205
- - Training completed on the BookSum dataset for 13 total epochs
206
- - **The final four epochs combined the training and validation sets as 'train' in an effort to increase generalization.**
207
-
208
  ### Training hyperparameters
209
 
210
- #### Initial Three Epochs
211
 
212
  The following hyperparameters were used during training:
213
- - learning_rate: 5e-05
214
- - train_batch_size: 1
215
- - eval_batch_size: 1
216
  - seed: 42
217
  - distributed_type: multi-GPU
218
- - gradient_accumulation_steps: 4
219
- - total_train_batch_size: 4
220
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
221
  - lr_scheduler_type: linear
222
- - num_epochs: 3
223
-
224
- #### In-between Epochs
225
-
226
- Unfortunately, don't have all records on-hand for middle epochs; the following should be representative:
227
-
228
- - learning_rate: 4e-05
229
- - train_batch_size: 2
230
- - eval_batch_size: 2
231
- - seed: 42
232
- - distributed_type: multi-GPU
233
- - gradient_accumulation_steps: 16
234
- - total_train_batch_size: 32
235
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
236
- - lr_scheduler_type: cosine
237
- - lr_scheduler_warmup_ratio: 0.05
238
- - num_epochs: 6 (in addition to prior model)
239
-
240
- #### Final Two Epochs
241
-
242
- The following hyperparameters were used during training:
243
- - learning_rate: 2e-05
244
- - train_batch_size: 1
245
- - eval_batch_size: 1
246
- - seed: 42
247
- - distributed_type: multi-GPU
248
- - gradient_accumulation_steps: 16
249
- - total_train_batch_size: 16
250
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
251
- - lr_scheduler_type: cosine
252
- - lr_scheduler_warmup_ratio: 0.03
253
- - num_epochs: 2 (in addition to prior model)
254
 
255
 
256
  ### Framework versions
257
 
258
- - Transformers 4.19.2
259
- - Pytorch 1.11.0+cu113
260
- - Datasets 2.2.2
261
- - Tokenizers 0.12.1
 
134
  num_beams: 4
135
  ---
136
 
137
+ # Multi-purpose Summarizer (Fine-tuned 3B google/flan-t5-xl on several Summarization datasets)
138
 
139
  <a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
140
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 
202
 
203
  ## Training procedure
204
 
205
+ - Training was done in BF16, deepspeed stage 2 for 6 epochs with ROUGE2 monitored on the validation set.
206
+ -
 
207
  ### Training hyperparameters
208
 
 
209
 
210
  The following hyperparameters were used during training:
211
+ - learning_rate: 3e-05
212
+ - train_batch_size: 5
213
+ - eval_batch_size: 8
214
  - seed: 42
215
  - distributed_type: multi-GPU
216
+ - gradient_accumulation_steps: 2
217
+ - effective_train_batch_size: 80
218
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
219
  - lr_scheduler_type: linear
220
+ - warmup_steps: 2000
221
+ - num_epochs: 10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
 
223
 
224
  ### Framework versions
225
 
226
+ - Transformers 4.24.0
227
+ - Pytorch 1.9.1+cu111
228
+ - Deepspeed 0.7.4
229
+ - Pytorch-lightning 1.8.1