Update README.md
Browse files
README.md
CHANGED
@@ -136,7 +136,7 @@ Additionally, 100+28 extra tokens were added for pre-training tasks, resulting i
|
|
136 |
### Pretraining
|
137 |
The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/),
|
138 |
for 2650000 steps with a batch size of 64
|
139 |
-
(in total
|
140 |
The optimizer used was AdaFactor with learning rate warmup for 10K steps with a constant learning rate of 1e-2,
|
141 |
and then an inverse square root decay (exponential decay) of the learning rate after.
|
142 |
The model was trained with Google's Jax/Flax based [t5x framework](https://github.com/google-research/t5x) with help
|
|
|
136 |
### Pretraining
|
137 |
The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/),
|
138 |
for 2650000 steps with a batch size of 64
|
139 |
+
(in total 84B tokens).
|
140 |
The optimizer used was AdaFactor with learning rate warmup for 10K steps with a constant learning rate of 1e-2,
|
141 |
and then an inverse square root decay (exponential decay) of the learning rate after.
|
142 |
The model was trained with Google's Jax/Flax based [t5x framework](https://github.com/google-research/t5x) with help
|