bert-large-uncased-mrpc / training.log
yoshitomo-matsubara's picture
tuned hyperparameters
1d49bf3
2021-05-25 19:49:54,507 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_large_uncased.yaml', log='log/glue/mrpc/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
2021-05-25 19:49:54,546 INFO __main__ Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True
2021-05-25 19:50:00,656 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
2021-05-25 19:50:02,769 INFO __main__ Start training
2021-05-25 19:50:02,770 INFO torchdistill.models.util [student model]
2021-05-25 19:50:02,770 INFO torchdistill.models.util Using the original student model
2021-05-25 19:50:02,770 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
2021-05-25 19:50:08,870 INFO torchdistill.misc.log Epoch: [0] [ 0/230] eta: 0:02:28 lr: 2.997391304347826e-05 sample/s: 6.235015084699529 loss: 0.7412 (0.7412) time: 0.6463 data: 0.0048 max mem: 5376
2021-05-25 19:50:34,421 INFO torchdistill.misc.log Epoch: [0] [ 50/230] eta: 0:01:32 lr: 2.8669565217391306e-05 sample/s: 8.613480125640086 loss: 0.6172 (0.6453) time: 0.5105 data: 0.0027 max mem: 8071
2021-05-25 19:50:59,496 INFO torchdistill.misc.log Epoch: [0] [100/230] eta: 0:01:05 lr: 2.736521739130435e-05 sample/s: 8.610049872675337 loss: 0.5745 (0.6214) time: 0.4995 data: 0.0027 max mem: 8340
2021-05-25 19:51:24,306 INFO torchdistill.misc.log Epoch: [0] [150/230] eta: 0:00:40 lr: 2.6060869565217393e-05 sample/s: 7.983387230932337 loss: 0.5508 (0.5978) time: 0.4954 data: 0.0027 max mem: 8343
2021-05-25 19:51:49,480 INFO torchdistill.misc.log Epoch: [0] [200/230] eta: 0:00:15 lr: 2.4756521739130433e-05 sample/s: 7.338986194467289 loss: 0.5243 (0.5819) time: 0.5120 data: 0.0026 max mem: 8343
2021-05-25 19:52:03,767 INFO torchdistill.misc.log Epoch: [0] Total time: 0:01:55
2021-05-25 19:52:07,585 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 19:52:07,585 INFO __main__ Validation: accuracy = 0.8063725490196079, f1 = 0.8596802841918295
2021-05-25 19:52:07,585 INFO __main__ Updating ckpt
2021-05-25 19:52:15,203 INFO torchdistill.misc.log Epoch: [1] [ 0/230] eta: 0:01:52 lr: 2.3973913043478262e-05 sample/s: 8.210228471122164 loss: 0.2772 (0.2772) time: 0.4906 data: 0.0034 max mem: 8343
2021-05-25 19:52:40,453 INFO torchdistill.misc.log Epoch: [1] [ 50/230] eta: 0:01:30 lr: 2.2669565217391306e-05 sample/s: 7.988404891733914 loss: 0.4216 (0.4209) time: 0.5130 data: 0.0027 max mem: 8343
2021-05-25 19:53:05,607 INFO torchdistill.misc.log Epoch: [1] [100/230] eta: 0:01:05 lr: 2.1365217391304346e-05 sample/s: 8.616271431909633 loss: 0.4311 (0.4199) time: 0.4990 data: 0.0027 max mem: 8343
2021-05-25 19:53:30,398 INFO torchdistill.misc.log Epoch: [1] [150/230] eta: 0:00:40 lr: 2.0060869565217393e-05 sample/s: 8.594665095668656 loss: 0.2910 (0.3994) time: 0.4916 data: 0.0027 max mem: 8343
2021-05-25 19:53:55,443 INFO torchdistill.misc.log Epoch: [1] [200/230] eta: 0:00:15 lr: 1.8756521739130436e-05 sample/s: 7.977715750281739 loss: 0.3855 (0.3950) time: 0.4897 data: 0.0027 max mem: 8343
2021-05-25 19:54:09,895 INFO torchdistill.misc.log Epoch: [1] Total time: 0:01:55
2021-05-25 19:54:13,711 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 19:54:13,712 INFO __main__ Validation: accuracy = 0.8553921568627451, f1 = 0.899488926746167
2021-05-25 19:54:13,712 INFO __main__ Updating ckpt
2021-05-25 19:54:21,849 INFO torchdistill.misc.log Epoch: [2] [ 0/230] eta: 0:02:26 lr: 1.7973913043478262e-05 sample/s: 6.3179051200469365 loss: 0.0593 (0.0593) time: 0.6368 data: 0.0037 max mem: 8343
2021-05-25 19:54:47,318 INFO torchdistill.misc.log Epoch: [2] [ 50/230] eta: 0:01:32 lr: 1.6669565217391305e-05 sample/s: 7.4142856069608625 loss: 0.1097 (0.1770) time: 0.5018 data: 0.0029 max mem: 8343
2021-05-25 19:55:12,346 INFO torchdistill.misc.log Epoch: [2] [100/230] eta: 0:01:05 lr: 1.536521739130435e-05 sample/s: 6.963207990009169 loss: 0.1442 (0.1919) time: 0.4986 data: 0.0026 max mem: 8343
2021-05-25 19:55:37,133 INFO torchdistill.misc.log Epoch: [2] [150/230] eta: 0:00:40 lr: 1.4060869565217393e-05 sample/s: 7.411881388451892 loss: 0.2391 (0.1978) time: 0.5008 data: 0.0027 max mem: 8343
2021-05-25 19:56:02,055 INFO torchdistill.misc.log Epoch: [2] [200/230] eta: 0:00:15 lr: 1.2756521739130435e-05 sample/s: 7.392133605686278 loss: 0.0212 (0.1974) time: 0.5063 data: 0.0027 max mem: 8343
2021-05-25 19:56:16,583 INFO torchdistill.misc.log Epoch: [2] Total time: 0:01:55
2021-05-25 19:56:20,398 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 19:56:20,399 INFO __main__ Validation: accuracy = 0.8455882352941176, f1 = 0.8877005347593583
2021-05-25 19:56:20,866 INFO torchdistill.misc.log Epoch: [3] [ 0/230] eta: 0:01:47 lr: 1.197391304347826e-05 sample/s: 8.617222924178455 loss: 0.2669 (0.2669) time: 0.4671 data: 0.0029 max mem: 8343
2021-05-25 19:56:45,813 INFO torchdistill.misc.log Epoch: [3] [ 50/230] eta: 0:01:29 lr: 1.0669565217391305e-05 sample/s: 8.612507353673449 loss: 0.0047 (0.1619) time: 0.5022 data: 0.0026 max mem: 8343
2021-05-25 19:57:10,910 INFO torchdistill.misc.log Epoch: [3] [100/230] eta: 0:01:05 lr: 9.365217391304347e-06 sample/s: 7.41079443382604 loss: 0.0007 (0.1258) time: 0.5112 data: 0.0026 max mem: 8343
2021-05-25 19:57:35,937 INFO torchdistill.misc.log Epoch: [3] [150/230] eta: 0:00:40 lr: 8.060869565217392e-06 sample/s: 7.976221401119329 loss: 0.0001 (0.1121) time: 0.5023 data: 0.0027 max mem: 8343
2021-05-25 19:58:00,921 INFO torchdistill.misc.log Epoch: [3] [200/230] eta: 0:00:15 lr: 6.756521739130434e-06 sample/s: 8.603982899951587 loss: 0.0000 (0.1496) time: 0.4897 data: 0.0027 max mem: 8343
2021-05-25 19:58:15,689 INFO torchdistill.misc.log Epoch: [3] Total time: 0:01:55
2021-05-25 19:58:19,502 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 19:58:19,503 INFO __main__ Validation: accuracy = 0.8504901960784313, f1 = 0.8884826325411335
2021-05-25 19:58:20,004 INFO torchdistill.misc.log Epoch: [4] [ 0/230] eta: 0:01:55 lr: 5.973913043478261e-06 sample/s: 8.038922893074167 loss: 0.0000 (0.0000) time: 0.5005 data: 0.0029 max mem: 8343
2021-05-25 19:58:45,232 INFO torchdistill.misc.log Epoch: [4] [ 50/230] eta: 0:01:30 lr: 4.669565217391304e-06 sample/s: 8.065907185329344 loss: 0.0000 (0.1483) time: 0.5099 data: 0.0028 max mem: 8343
2021-05-25 19:59:10,190 INFO torchdistill.misc.log Epoch: [4] [100/230] eta: 0:01:05 lr: 3.365217391304348e-06 sample/s: 8.05736566867782 loss: 0.0000 (0.1296) time: 0.5081 data: 0.0028 max mem: 8343
2021-05-25 19:59:35,163 INFO torchdistill.misc.log Epoch: [4] [150/230] eta: 0:00:40 lr: 2.0608695652173915e-06 sample/s: 7.483834821575934 loss: 0.0000 (0.1036) time: 0.4983 data: 0.0026 max mem: 8343
2021-05-25 19:59:59,755 INFO torchdistill.misc.log Epoch: [4] [200/230] eta: 0:00:14 lr: 7.565217391304349e-07 sample/s: 8.709057781053877 loss: 0.0000 (0.1062) time: 0.4910 data: 0.0027 max mem: 8343
2021-05-25 20:00:13,848 INFO torchdistill.misc.log Epoch: [4] Total time: 0:01:54
2021-05-25 20:00:17,666 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 20:00:17,666 INFO __main__ Validation: accuracy = 0.8799019607843137, f1 = 0.9162393162393162
2021-05-25 20:00:17,666 INFO __main__ Updating ckpt
2021-05-25 20:00:31,000 INFO __main__ [Student: bert-large-uncased]
2021-05-25 20:00:34,825 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
2021-05-25 20:00:34,825 INFO __main__ Test: accuracy = 0.8799019607843137, f1 = 0.9162393162393162
2021-05-25 20:00:34,826 INFO __main__ Start prediction for private dataset(s)
2021-05-25 20:00:34,827 INFO __main__ mrpc/test: 1725 samples