Update README.md
Browse files
README.md
CHANGED
@@ -28,4 +28,11 @@ See the `clean` directory for the clean script.
|
|
28 |
|
29 |
## Training
|
30 |
|
31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
## Training
|
30 |
|
31 |
+
Training of the model was resumed from an earlier checkpoint several times, as can be seen in the training metrics tab. (switch to wall time for a better view).
|
32 |
+
|
33 |
+
After several hours of training an error would be raised that we haven't been able to identify and solve. As a workaround,
|
34 |
+
the first few resumes would start again at step 0 with a different seeded reshuffling of the data.
|
35 |
+
In the last two resumes the random seed was fixed, and training would resume at the previous step, since a try/except around the failing example would allow training to continue in the case of errors caused by a single example.
|
36 |
+
|
37 |
+
The final model was trained for 63000 steps with a batch size of 128, ending with an evaluation loss of 1.79 and accuracy of 0.64.
|
38 |
+
A triangle learning rate schedule was used, with peak learning rate 0.01 for the first few runs, and 0.001 for the last two runs.
|