jordiclive
commited on
Commit
•
b179c47
1
Parent(s):
905c83b
Update README.md
Browse files
README.md
CHANGED
@@ -134,7 +134,7 @@ inference:
|
|
134 |
num_beams: 4
|
135 |
---
|
136 |
|
137 |
-
# Multi-purpose Summarizer (Fine-tuned google/flan-t5-xl
|
138 |
|
139 |
<a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
|
140 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
@@ -202,60 +202,28 @@ If having computing constraints, try the base version [`pszemraj/led-base-book-s
|
|
202 |
|
203 |
## Training procedure
|
204 |
|
205 |
-
- Training
|
206 |
-
-
|
207 |
-
|
208 |
### Training hyperparameters
|
209 |
|
210 |
-
#### Initial Three Epochs
|
211 |
|
212 |
The following hyperparameters were used during training:
|
213 |
-
- learning_rate:
|
214 |
-
- train_batch_size:
|
215 |
-
- eval_batch_size:
|
216 |
- seed: 42
|
217 |
- distributed_type: multi-GPU
|
218 |
-
- gradient_accumulation_steps:
|
219 |
-
-
|
220 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
221 |
- lr_scheduler_type: linear
|
222 |
-
-
|
223 |
-
|
224 |
-
#### In-between Epochs
|
225 |
-
|
226 |
-
Unfortunately, don't have all records on-hand for middle epochs; the following should be representative:
|
227 |
-
|
228 |
-
- learning_rate: 4e-05
|
229 |
-
- train_batch_size: 2
|
230 |
-
- eval_batch_size: 2
|
231 |
-
- seed: 42
|
232 |
-
- distributed_type: multi-GPU
|
233 |
-
- gradient_accumulation_steps: 16
|
234 |
-
- total_train_batch_size: 32
|
235 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
236 |
-
- lr_scheduler_type: cosine
|
237 |
-
- lr_scheduler_warmup_ratio: 0.05
|
238 |
-
- num_epochs: 6 (in addition to prior model)
|
239 |
-
|
240 |
-
#### Final Two Epochs
|
241 |
-
|
242 |
-
The following hyperparameters were used during training:
|
243 |
-
- learning_rate: 2e-05
|
244 |
-
- train_batch_size: 1
|
245 |
-
- eval_batch_size: 1
|
246 |
-
- seed: 42
|
247 |
-
- distributed_type: multi-GPU
|
248 |
-
- gradient_accumulation_steps: 16
|
249 |
-
- total_train_batch_size: 16
|
250 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
251 |
-
- lr_scheduler_type: cosine
|
252 |
-
- lr_scheduler_warmup_ratio: 0.03
|
253 |
-
- num_epochs: 2 (in addition to prior model)
|
254 |
|
255 |
|
256 |
### Framework versions
|
257 |
|
258 |
-
- Transformers 4.
|
259 |
-
- Pytorch 1.
|
260 |
-
-
|
261 |
-
-
|
|
|
134 |
num_beams: 4
|
135 |
---
|
136 |
|
137 |
+
# Multi-purpose Summarizer (Fine-tuned 3B google/flan-t5-xl on several Summarization datasets)
|
138 |
|
139 |
<a href="https://colab.research.google.com/gist/pszemraj/3eba944ddc9fc9a4a1bfb21e83b57620/summarization-token-batching.ipynb">
|
140 |
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
|
|
202 |
|
203 |
## Training procedure
|
204 |
|
205 |
+
- Training was done in BF16, deepspeed stage 2 for 6 epochs with ROUGE2 monitored on the validation set.
|
206 |
+
-
|
|
|
207 |
### Training hyperparameters
|
208 |
|
|
|
209 |
|
210 |
The following hyperparameters were used during training:
|
211 |
+
- learning_rate: 3e-05
|
212 |
+
- train_batch_size: 5
|
213 |
+
- eval_batch_size: 8
|
214 |
- seed: 42
|
215 |
- distributed_type: multi-GPU
|
216 |
+
- gradient_accumulation_steps: 2
|
217 |
+
- effective_train_batch_size: 80
|
218 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
219 |
- lr_scheduler_type: linear
|
220 |
+
- warmup_steps: 2000
|
221 |
+
- num_epochs: 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
222 |
|
223 |
|
224 |
### Framework versions
|
225 |
|
226 |
+
- Transformers 4.24.0
|
227 |
+
- Pytorch 1.9.1+cu111
|
228 |
+
- Deepspeed 0.7.4
|
229 |
+
- Pytorch-lightning 1.8.1
|