yoshitomo-matsubara
commited on
Commit
•
1d49bf3
1
Parent(s):
c94d09f
tuned hyperparameters
Browse files- pytorch_model.bin +1 -1
- tokenizer.json +0 -0
- training.log +55 -45
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1340746825
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b075bad5f32af3a4f76c8e3344928e4c7cce13047c1ef9781bcd8edc4c54be7a
|
3 |
size 1340746825
|
tokenizer.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
training.log
CHANGED
@@ -1,51 +1,61 @@
|
|
1 |
-
2021-05-
|
2 |
-
2021-05-
|
3 |
Num processes: 1
|
4 |
Process index: 0
|
5 |
Local process index: 0
|
6 |
Device: cuda
|
7 |
Use FP16 precision: True
|
8 |
|
9 |
-
2021-05-
|
10 |
-
2021-05-
|
11 |
-
2021-05-
|
12 |
-
2021-05-
|
13 |
-
2021-05-
|
14 |
-
2021-05-
|
15 |
-
2021-05-
|
16 |
-
2021-05-
|
17 |
-
2021-05-
|
18 |
-
2021-05-
|
19 |
-
2021-05-
|
20 |
-
2021-05-
|
21 |
-
2021-05-
|
22 |
-
2021-05-
|
23 |
-
2021-05-
|
24 |
-
2021-05-
|
25 |
-
2021-05-
|
26 |
-
2021-05-
|
27 |
-
2021-05-
|
28 |
-
2021-05-
|
29 |
-
2021-05-
|
30 |
-
2021-05-
|
31 |
-
2021-05-
|
32 |
-
2021-05-
|
33 |
-
2021-05-
|
34 |
-
2021-05-
|
35 |
-
2021-05-
|
36 |
-
2021-05-
|
37 |
-
2021-05-
|
38 |
-
2021-05-
|
39 |
-
2021-05-
|
40 |
-
2021-05-
|
41 |
-
2021-05-
|
42 |
-
2021-05-
|
43 |
-
2021-05-
|
44 |
-
2021-05-
|
45 |
-
2021-05-
|
46 |
-
2021-05-
|
47 |
-
2021-05-
|
48 |
-
2021-05-
|
49 |
-
2021-05-
|
50 |
-
2021-05-
|
51 |
-
2021-05-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2021-05-25 19:49:54,507 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_large_uncased.yaml', log='log/glue/mrpc/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
|
2 |
+
2021-05-25 19:49:54,546 INFO __main__ Distributed environment: NO
|
3 |
Num processes: 1
|
4 |
Process index: 0
|
5 |
Local process index: 0
|
6 |
Device: cuda
|
7 |
Use FP16 precision: True
|
8 |
|
9 |
+
2021-05-25 19:50:00,656 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
|
10 |
+
2021-05-25 19:50:02,769 INFO __main__ Start training
|
11 |
+
2021-05-25 19:50:02,770 INFO torchdistill.models.util [student model]
|
12 |
+
2021-05-25 19:50:02,770 INFO torchdistill.models.util Using the original student model
|
13 |
+
2021-05-25 19:50:02,770 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
|
14 |
+
2021-05-25 19:50:08,870 INFO torchdistill.misc.log Epoch: [0] [ 0/230] eta: 0:02:28 lr: 2.997391304347826e-05 sample/s: 6.235015084699529 loss: 0.7412 (0.7412) time: 0.6463 data: 0.0048 max mem: 5376
|
15 |
+
2021-05-25 19:50:34,421 INFO torchdistill.misc.log Epoch: [0] [ 50/230] eta: 0:01:32 lr: 2.8669565217391306e-05 sample/s: 8.613480125640086 loss: 0.6172 (0.6453) time: 0.5105 data: 0.0027 max mem: 8071
|
16 |
+
2021-05-25 19:50:59,496 INFO torchdistill.misc.log Epoch: [0] [100/230] eta: 0:01:05 lr: 2.736521739130435e-05 sample/s: 8.610049872675337 loss: 0.5745 (0.6214) time: 0.4995 data: 0.0027 max mem: 8340
|
17 |
+
2021-05-25 19:51:24,306 INFO torchdistill.misc.log Epoch: [0] [150/230] eta: 0:00:40 lr: 2.6060869565217393e-05 sample/s: 7.983387230932337 loss: 0.5508 (0.5978) time: 0.4954 data: 0.0027 max mem: 8343
|
18 |
+
2021-05-25 19:51:49,480 INFO torchdistill.misc.log Epoch: [0] [200/230] eta: 0:00:15 lr: 2.4756521739130433e-05 sample/s: 7.338986194467289 loss: 0.5243 (0.5819) time: 0.5120 data: 0.0026 max mem: 8343
|
19 |
+
2021-05-25 19:52:03,767 INFO torchdistill.misc.log Epoch: [0] Total time: 0:01:55
|
20 |
+
2021-05-25 19:52:07,585 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
21 |
+
2021-05-25 19:52:07,585 INFO __main__ Validation: accuracy = 0.8063725490196079, f1 = 0.8596802841918295
|
22 |
+
2021-05-25 19:52:07,585 INFO __main__ Updating ckpt
|
23 |
+
2021-05-25 19:52:15,203 INFO torchdistill.misc.log Epoch: [1] [ 0/230] eta: 0:01:52 lr: 2.3973913043478262e-05 sample/s: 8.210228471122164 loss: 0.2772 (0.2772) time: 0.4906 data: 0.0034 max mem: 8343
|
24 |
+
2021-05-25 19:52:40,453 INFO torchdistill.misc.log Epoch: [1] [ 50/230] eta: 0:01:30 lr: 2.2669565217391306e-05 sample/s: 7.988404891733914 loss: 0.4216 (0.4209) time: 0.5130 data: 0.0027 max mem: 8343
|
25 |
+
2021-05-25 19:53:05,607 INFO torchdistill.misc.log Epoch: [1] [100/230] eta: 0:01:05 lr: 2.1365217391304346e-05 sample/s: 8.616271431909633 loss: 0.4311 (0.4199) time: 0.4990 data: 0.0027 max mem: 8343
|
26 |
+
2021-05-25 19:53:30,398 INFO torchdistill.misc.log Epoch: [1] [150/230] eta: 0:00:40 lr: 2.0060869565217393e-05 sample/s: 8.594665095668656 loss: 0.2910 (0.3994) time: 0.4916 data: 0.0027 max mem: 8343
|
27 |
+
2021-05-25 19:53:55,443 INFO torchdistill.misc.log Epoch: [1] [200/230] eta: 0:00:15 lr: 1.8756521739130436e-05 sample/s: 7.977715750281739 loss: 0.3855 (0.3950) time: 0.4897 data: 0.0027 max mem: 8343
|
28 |
+
2021-05-25 19:54:09,895 INFO torchdistill.misc.log Epoch: [1] Total time: 0:01:55
|
29 |
+
2021-05-25 19:54:13,711 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
30 |
+
2021-05-25 19:54:13,712 INFO __main__ Validation: accuracy = 0.8553921568627451, f1 = 0.899488926746167
|
31 |
+
2021-05-25 19:54:13,712 INFO __main__ Updating ckpt
|
32 |
+
2021-05-25 19:54:21,849 INFO torchdistill.misc.log Epoch: [2] [ 0/230] eta: 0:02:26 lr: 1.7973913043478262e-05 sample/s: 6.3179051200469365 loss: 0.0593 (0.0593) time: 0.6368 data: 0.0037 max mem: 8343
|
33 |
+
2021-05-25 19:54:47,318 INFO torchdistill.misc.log Epoch: [2] [ 50/230] eta: 0:01:32 lr: 1.6669565217391305e-05 sample/s: 7.4142856069608625 loss: 0.1097 (0.1770) time: 0.5018 data: 0.0029 max mem: 8343
|
34 |
+
2021-05-25 19:55:12,346 INFO torchdistill.misc.log Epoch: [2] [100/230] eta: 0:01:05 lr: 1.536521739130435e-05 sample/s: 6.963207990009169 loss: 0.1442 (0.1919) time: 0.4986 data: 0.0026 max mem: 8343
|
35 |
+
2021-05-25 19:55:37,133 INFO torchdistill.misc.log Epoch: [2] [150/230] eta: 0:00:40 lr: 1.4060869565217393e-05 sample/s: 7.411881388451892 loss: 0.2391 (0.1978) time: 0.5008 data: 0.0027 max mem: 8343
|
36 |
+
2021-05-25 19:56:02,055 INFO torchdistill.misc.log Epoch: [2] [200/230] eta: 0:00:15 lr: 1.2756521739130435e-05 sample/s: 7.392133605686278 loss: 0.0212 (0.1974) time: 0.5063 data: 0.0027 max mem: 8343
|
37 |
+
2021-05-25 19:56:16,583 INFO torchdistill.misc.log Epoch: [2] Total time: 0:01:55
|
38 |
+
2021-05-25 19:56:20,398 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
39 |
+
2021-05-25 19:56:20,399 INFO __main__ Validation: accuracy = 0.8455882352941176, f1 = 0.8877005347593583
|
40 |
+
2021-05-25 19:56:20,866 INFO torchdistill.misc.log Epoch: [3] [ 0/230] eta: 0:01:47 lr: 1.197391304347826e-05 sample/s: 8.617222924178455 loss: 0.2669 (0.2669) time: 0.4671 data: 0.0029 max mem: 8343
|
41 |
+
2021-05-25 19:56:45,813 INFO torchdistill.misc.log Epoch: [3] [ 50/230] eta: 0:01:29 lr: 1.0669565217391305e-05 sample/s: 8.612507353673449 loss: 0.0047 (0.1619) time: 0.5022 data: 0.0026 max mem: 8343
|
42 |
+
2021-05-25 19:57:10,910 INFO torchdistill.misc.log Epoch: [3] [100/230] eta: 0:01:05 lr: 9.365217391304347e-06 sample/s: 7.41079443382604 loss: 0.0007 (0.1258) time: 0.5112 data: 0.0026 max mem: 8343
|
43 |
+
2021-05-25 19:57:35,937 INFO torchdistill.misc.log Epoch: [3] [150/230] eta: 0:00:40 lr: 8.060869565217392e-06 sample/s: 7.976221401119329 loss: 0.0001 (0.1121) time: 0.5023 data: 0.0027 max mem: 8343
|
44 |
+
2021-05-25 19:58:00,921 INFO torchdistill.misc.log Epoch: [3] [200/230] eta: 0:00:15 lr: 6.756521739130434e-06 sample/s: 8.603982899951587 loss: 0.0000 (0.1496) time: 0.4897 data: 0.0027 max mem: 8343
|
45 |
+
2021-05-25 19:58:15,689 INFO torchdistill.misc.log Epoch: [3] Total time: 0:01:55
|
46 |
+
2021-05-25 19:58:19,502 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
47 |
+
2021-05-25 19:58:19,503 INFO __main__ Validation: accuracy = 0.8504901960784313, f1 = 0.8884826325411335
|
48 |
+
2021-05-25 19:58:20,004 INFO torchdistill.misc.log Epoch: [4] [ 0/230] eta: 0:01:55 lr: 5.973913043478261e-06 sample/s: 8.038922893074167 loss: 0.0000 (0.0000) time: 0.5005 data: 0.0029 max mem: 8343
|
49 |
+
2021-05-25 19:58:45,232 INFO torchdistill.misc.log Epoch: [4] [ 50/230] eta: 0:01:30 lr: 4.669565217391304e-06 sample/s: 8.065907185329344 loss: 0.0000 (0.1483) time: 0.5099 data: 0.0028 max mem: 8343
|
50 |
+
2021-05-25 19:59:10,190 INFO torchdistill.misc.log Epoch: [4] [100/230] eta: 0:01:05 lr: 3.365217391304348e-06 sample/s: 8.05736566867782 loss: 0.0000 (0.1296) time: 0.5081 data: 0.0028 max mem: 8343
|
51 |
+
2021-05-25 19:59:35,163 INFO torchdistill.misc.log Epoch: [4] [150/230] eta: 0:00:40 lr: 2.0608695652173915e-06 sample/s: 7.483834821575934 loss: 0.0000 (0.1036) time: 0.4983 data: 0.0026 max mem: 8343
|
52 |
+
2021-05-25 19:59:59,755 INFO torchdistill.misc.log Epoch: [4] [200/230] eta: 0:00:14 lr: 7.565217391304349e-07 sample/s: 8.709057781053877 loss: 0.0000 (0.1062) time: 0.4910 data: 0.0027 max mem: 8343
|
53 |
+
2021-05-25 20:00:13,848 INFO torchdistill.misc.log Epoch: [4] Total time: 0:01:54
|
54 |
+
2021-05-25 20:00:17,666 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
55 |
+
2021-05-25 20:00:17,666 INFO __main__ Validation: accuracy = 0.8799019607843137, f1 = 0.9162393162393162
|
56 |
+
2021-05-25 20:00:17,666 INFO __main__ Updating ckpt
|
57 |
+
2021-05-25 20:00:31,000 INFO __main__ [Student: bert-large-uncased]
|
58 |
+
2021-05-25 20:00:34,825 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
|
59 |
+
2021-05-25 20:00:34,825 INFO __main__ Test: accuracy = 0.8799019607843137, f1 = 0.9162393162393162
|
60 |
+
2021-05-25 20:00:34,826 INFO __main__ Start prediction for private dataset(s)
|
61 |
+
2021-05-25 20:00:34,827 INFO __main__ mrpc/test: 1725 samples
|