yoshitomo-matsubara commited on
Commit
1d49bf3
1 Parent(s): c94d09f

tuned hyperparameters

Browse files
Files changed (3) hide show
  1. pytorch_model.bin +1 -1
  2. tokenizer.json +0 -0
  3. training.log +55 -45
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b527b40cea6a770354120ce1112fe542240e4d050d540755f70fb0cf2da02592
3
  size 1340746825
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b075bad5f32af3a4f76c8e3344928e4c7cce13047c1ef9781bcd8edc4c54be7a
3
  size 1340746825
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
training.log CHANGED
@@ -1,51 +1,61 @@
1
- 2021-05-21 20:55:41,099 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_large_uncased.yaml', log='log/glue/mrpc/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
2
- 2021-05-21 20:55:41,134 INFO __main__ Distributed environment: NO
3
  Num processes: 1
4
  Process index: 0
5
  Local process index: 0
6
  Device: cuda
7
  Use FP16 precision: True
8
 
9
- 2021-05-21 20:56:09,963 INFO __main__ Start training
10
- 2021-05-21 20:56:09,964 INFO torchdistill.models.util [student model]
11
- 2021-05-21 20:56:09,964 INFO torchdistill.models.util Using the original student model
12
- 2021-05-21 20:56:09,964 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
13
- 2021-05-21 20:56:13,340 INFO torchdistill.misc.log Epoch: [0] [ 0/115] eta: 0:01:32 lr: 1.996521739130435e-05 sample/s: 5.0108644488568075 loss: 0.7117 (0.7117) time: 0.8047 data: 0.0065 max mem: 5401
14
- 2021-05-21 20:56:57,890 INFO torchdistill.misc.log Epoch: [0] [ 50/115] eta: 0:00:57 lr: 1.822608695652174e-05 sample/s: 4.658316585970199 loss: 0.6187 (0.6438) time: 0.8854 data: 0.0047 max mem: 10945
15
- 2021-05-21 20:57:42,797 INFO torchdistill.misc.log Epoch: [0] [100/115] eta: 0:00:13 lr: 1.6486956521739132e-05 sample/s: 4.6675958893857725 loss: 0.6068 (0.6307) time: 0.8995 data: 0.0046 max mem: 10946
16
- 2021-05-21 20:57:55,065 INFO torchdistill.misc.log Epoch: [0] Total time: 0:01:42
17
- 2021-05-21 20:57:58,697 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
18
- 2021-05-21 20:57:58,698 INFO __main__ Validation: accuracy = 0.6911764705882353, f1 = 0.8141592920353982
19
- 2021-05-21 20:57:58,698 INFO __main__ Updating ckpt
20
- 2021-05-21 20:58:04,096 INFO torchdistill.misc.log Epoch: [1] [ 0/115] eta: 0:01:41 lr: 1.596521739130435e-05 sample/s: 4.550537109441219 loss: 0.6108 (0.6108) time: 0.8846 data: 0.0056 max mem: 10946
21
- 2021-05-21 20:58:48,505 INFO torchdistill.misc.log Epoch: [1] [ 50/115] eta: 0:00:57 lr: 1.4226086956521742e-05 sample/s: 5.072097714849413 loss: 0.5569 (0.5700) time: 0.9032 data: 0.0046 max mem: 10946
22
- 2021-05-21 20:59:33,555 INFO torchdistill.misc.log Epoch: [1] [100/115] eta: 0:00:13 lr: 1.2486956521739131e-05 sample/s: 4.004992031655668 loss: 0.5414 (0.5664) time: 0.8920 data: 0.0046 max mem: 10946
23
- 2021-05-21 20:59:45,989 INFO torchdistill.misc.log Epoch: [1] Total time: 0:01:42
24
- 2021-05-21 20:59:49,619 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
25
- 2021-05-21 20:59:49,620 INFO __main__ Validation: accuracy = 0.7524509803921569, f1 = 0.8378812199036919
26
- 2021-05-21 20:59:49,620 INFO __main__ Updating ckpt
27
- 2021-05-21 20:59:55,254 INFO torchdistill.misc.log Epoch: [2] [ 0/115] eta: 0:01:41 lr: 1.196521739130435e-05 sample/s: 4.561543612792821 loss: 0.4717 (0.4717) time: 0.8828 data: 0.0059 max mem: 10946
28
- 2021-05-21 21:00:39,815 INFO torchdistill.misc.log Epoch: [2] [ 50/115] eta: 0:00:57 lr: 1.022608695652174e-05 sample/s: 4.008234729448237 loss: 0.5461 (0.5267) time: 0.8850 data: 0.0046 max mem: 10946
29
- 2021-05-21 21:01:24,369 INFO torchdistill.misc.log Epoch: [2] [100/115] eta: 0:00:13 lr: 8.48695652173913e-06 sample/s: 5.080750089185291 loss: 0.4628 (0.5161) time: 0.8997 data: 0.0047 max mem: 10946
30
- 2021-05-21 21:01:36,665 INFO torchdistill.misc.log Epoch: [2] Total time: 0:01:42
31
- 2021-05-21 21:01:40,295 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
32
- 2021-05-21 21:01:40,295 INFO __main__ Validation: accuracy = 0.7696078431372549, f1 = 0.8469055374592835
33
- 2021-05-21 21:01:40,295 INFO __main__ Updating ckpt
34
- 2021-05-21 21:01:45,871 INFO torchdistill.misc.log Epoch: [3] [ 0/115] eta: 0:01:41 lr: 7.965217391304349e-06 sample/s: 4.552310192381945 loss: 0.4230 (0.4230) time: 0.8846 data: 0.0059 max mem: 10946
35
- 2021-05-21 21:02:30,881 INFO torchdistill.misc.log Epoch: [3] [ 50/115] eta: 0:00:58 lr: 6.226086956521739e-06 sample/s: 4.298450072186489 loss: 0.4338 (0.4609) time: 0.8961 data: 0.0048 max mem: 10946
36
- 2021-05-21 21:03:15,449 INFO torchdistill.misc.log Epoch: [3] [100/115] eta: 0:00:13 lr: 4.486956521739131e-06 sample/s: 4.298780486242516 loss: 0.4697 (0.4590) time: 0.8995 data: 0.0047 max mem: 10946
37
- 2021-05-21 21:03:27,486 INFO torchdistill.misc.log Epoch: [3] Total time: 0:01:42
38
- 2021-05-21 21:03:31,118 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
39
- 2021-05-21 21:03:31,118 INFO __main__ Validation: accuracy = 0.7622549019607843, f1 = 0.8186915887850468
40
- 2021-05-21 21:03:31,983 INFO torchdistill.misc.log Epoch: [4] [ 0/115] eta: 0:01:39 lr: 3.965217391304348e-06 sample/s: 4.657085579681301 loss: 0.3907 (0.3907) time: 0.8638 data: 0.0048 max mem: 10946
41
- 2021-05-21 21:04:17,067 INFO torchdistill.misc.log Epoch: [4] [ 50/115] eta: 0:00:58 lr: 2.2260869565217395e-06 sample/s: 4.657376462576989 loss: 0.3632 (0.3954) time: 0.9041 data: 0.0048 max mem: 10946
42
- 2021-05-21 21:05:02,138 INFO torchdistill.misc.log Epoch: [4] [100/115] eta: 0:00:13 lr: 4.869565217391305e-07 sample/s: 4.656771466962662 loss: 0.4057 (0.3937) time: 0.8889 data: 0.0047 max mem: 10946
43
- 2021-05-21 21:05:14,248 INFO torchdistill.misc.log Epoch: [4] Total time: 0:01:43
44
- 2021-05-21 21:05:17,878 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
45
- 2021-05-21 21:05:17,879 INFO __main__ Validation: accuracy = 0.7941176470588235, f1 = 0.8571428571428571
46
- 2021-05-21 21:05:17,879 INFO __main__ Updating ckpt
47
- 2021-05-21 21:05:28,554 INFO __main__ [Student: bert-large-uncased]
48
- 2021-05-21 21:05:32,209 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
49
- 2021-05-21 21:05:32,209 INFO __main__ Test: accuracy = 0.7941176470588235, f1 = 0.8571428571428571
50
- 2021-05-21 21:05:32,210 INFO __main__ Start prediction for private dataset(s)
51
- 2021-05-21 21:05:32,211 INFO __main__ mrpc/test: 1725 samples
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-05-25 19:49:54,507 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_large_uncased.yaml', log='log/glue/mrpc/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
2
+ 2021-05-25 19:49:54,546 INFO __main__ Distributed environment: NO
3
  Num processes: 1
4
  Process index: 0
5
  Local process index: 0
6
  Device: cuda
7
  Use FP16 precision: True
8
 
9
+ 2021-05-25 19:50:00,656 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
10
+ 2021-05-25 19:50:02,769 INFO __main__ Start training
11
+ 2021-05-25 19:50:02,770 INFO torchdistill.models.util [student model]
12
+ 2021-05-25 19:50:02,770 INFO torchdistill.models.util Using the original student model
13
+ 2021-05-25 19:50:02,770 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
14
+ 2021-05-25 19:50:08,870 INFO torchdistill.misc.log Epoch: [0] [ 0/230] eta: 0:02:28 lr: 2.997391304347826e-05 sample/s: 6.235015084699529 loss: 0.7412 (0.7412) time: 0.6463 data: 0.0048 max mem: 5376
15
+ 2021-05-25 19:50:34,421 INFO torchdistill.misc.log Epoch: [0] [ 50/230] eta: 0:01:32 lr: 2.8669565217391306e-05 sample/s: 8.613480125640086 loss: 0.6172 (0.6453) time: 0.5105 data: 0.0027 max mem: 8071
16
+ 2021-05-25 19:50:59,496 INFO torchdistill.misc.log Epoch: [0] [100/230] eta: 0:01:05 lr: 2.736521739130435e-05 sample/s: 8.610049872675337 loss: 0.5745 (0.6214) time: 0.4995 data: 0.0027 max mem: 8340
17
+ 2021-05-25 19:51:24,306 INFO torchdistill.misc.log Epoch: [0] [150/230] eta: 0:00:40 lr: 2.6060869565217393e-05 sample/s: 7.983387230932337 loss: 0.5508 (0.5978) time: 0.4954 data: 0.0027 max mem: 8343
18
+ 2021-05-25 19:51:49,480 INFO torchdistill.misc.log Epoch: [0] [200/230] eta: 0:00:15 lr: 2.4756521739130433e-05 sample/s: 7.338986194467289 loss: 0.5243 (0.5819) time: 0.5120 data: 0.0026 max mem: 8343
19
+ 2021-05-25 19:52:03,767 INFO torchdistill.misc.log Epoch: [0] Total time: 0:01:55
20
+ 2021-05-25 19:52:07,585 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
21
+ 2021-05-25 19:52:07,585 INFO __main__ Validation: accuracy = 0.8063725490196079, f1 = 0.8596802841918295
22
+ 2021-05-25 19:52:07,585 INFO __main__ Updating ckpt
23
+ 2021-05-25 19:52:15,203 INFO torchdistill.misc.log Epoch: [1] [ 0/230] eta: 0:01:52 lr: 2.3973913043478262e-05 sample/s: 8.210228471122164 loss: 0.2772 (0.2772) time: 0.4906 data: 0.0034 max mem: 8343
24
+ 2021-05-25 19:52:40,453 INFO torchdistill.misc.log Epoch: [1] [ 50/230] eta: 0:01:30 lr: 2.2669565217391306e-05 sample/s: 7.988404891733914 loss: 0.4216 (0.4209) time: 0.5130 data: 0.0027 max mem: 8343
25
+ 2021-05-25 19:53:05,607 INFO torchdistill.misc.log Epoch: [1] [100/230] eta: 0:01:05 lr: 2.1365217391304346e-05 sample/s: 8.616271431909633 loss: 0.4311 (0.4199) time: 0.4990 data: 0.0027 max mem: 8343
26
+ 2021-05-25 19:53:30,398 INFO torchdistill.misc.log Epoch: [1] [150/230] eta: 0:00:40 lr: 2.0060869565217393e-05 sample/s: 8.594665095668656 loss: 0.2910 (0.3994) time: 0.4916 data: 0.0027 max mem: 8343
27
+ 2021-05-25 19:53:55,443 INFO torchdistill.misc.log Epoch: [1] [200/230] eta: 0:00:15 lr: 1.8756521739130436e-05 sample/s: 7.977715750281739 loss: 0.3855 (0.3950) time: 0.4897 data: 0.0027 max mem: 8343
28
+ 2021-05-25 19:54:09,895 INFO torchdistill.misc.log Epoch: [1] Total time: 0:01:55
29
+ 2021-05-25 19:54:13,711 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
30
+ 2021-05-25 19:54:13,712 INFO __main__ Validation: accuracy = 0.8553921568627451, f1 = 0.899488926746167
31
+ 2021-05-25 19:54:13,712 INFO __main__ Updating ckpt
32
+ 2021-05-25 19:54:21,849 INFO torchdistill.misc.log Epoch: [2] [ 0/230] eta: 0:02:26 lr: 1.7973913043478262e-05 sample/s: 6.3179051200469365 loss: 0.0593 (0.0593) time: 0.6368 data: 0.0037 max mem: 8343
33
+ 2021-05-25 19:54:47,318 INFO torchdistill.misc.log Epoch: [2] [ 50/230] eta: 0:01:32 lr: 1.6669565217391305e-05 sample/s: 7.4142856069608625 loss: 0.1097 (0.1770) time: 0.5018 data: 0.0029 max mem: 8343
34
+ 2021-05-25 19:55:12,346 INFO torchdistill.misc.log Epoch: [2] [100/230] eta: 0:01:05 lr: 1.536521739130435e-05 sample/s: 6.963207990009169 loss: 0.1442 (0.1919) time: 0.4986 data: 0.0026 max mem: 8343
35
+ 2021-05-25 19:55:37,133 INFO torchdistill.misc.log Epoch: [2] [150/230] eta: 0:00:40 lr: 1.4060869565217393e-05 sample/s: 7.411881388451892 loss: 0.2391 (0.1978) time: 0.5008 data: 0.0027 max mem: 8343
36
+ 2021-05-25 19:56:02,055 INFO torchdistill.misc.log Epoch: [2] [200/230] eta: 0:00:15 lr: 1.2756521739130435e-05 sample/s: 7.392133605686278 loss: 0.0212 (0.1974) time: 0.5063 data: 0.0027 max mem: 8343
37
+ 2021-05-25 19:56:16,583 INFO torchdistill.misc.log Epoch: [2] Total time: 0:01:55
38
+ 2021-05-25 19:56:20,398 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
39
+ 2021-05-25 19:56:20,399 INFO __main__ Validation: accuracy = 0.8455882352941176, f1 = 0.8877005347593583
40
+ 2021-05-25 19:56:20,866 INFO torchdistill.misc.log Epoch: [3] [ 0/230] eta: 0:01:47 lr: 1.197391304347826e-05 sample/s: 8.617222924178455 loss: 0.2669 (0.2669) time: 0.4671 data: 0.0029 max mem: 8343
41
+ 2021-05-25 19:56:45,813 INFO torchdistill.misc.log Epoch: [3] [ 50/230] eta: 0:01:29 lr: 1.0669565217391305e-05 sample/s: 8.612507353673449 loss: 0.0047 (0.1619) time: 0.5022 data: 0.0026 max mem: 8343
42
+ 2021-05-25 19:57:10,910 INFO torchdistill.misc.log Epoch: [3] [100/230] eta: 0:01:05 lr: 9.365217391304347e-06 sample/s: 7.41079443382604 loss: 0.0007 (0.1258) time: 0.5112 data: 0.0026 max mem: 8343
43
+ 2021-05-25 19:57:35,937 INFO torchdistill.misc.log Epoch: [3] [150/230] eta: 0:00:40 lr: 8.060869565217392e-06 sample/s: 7.976221401119329 loss: 0.0001 (0.1121) time: 0.5023 data: 0.0027 max mem: 8343
44
+ 2021-05-25 19:58:00,921 INFO torchdistill.misc.log Epoch: [3] [200/230] eta: 0:00:15 lr: 6.756521739130434e-06 sample/s: 8.603982899951587 loss: 0.0000 (0.1496) time: 0.4897 data: 0.0027 max mem: 8343
45
+ 2021-05-25 19:58:15,689 INFO torchdistill.misc.log Epoch: [3] Total time: 0:01:55
46
+ 2021-05-25 19:58:19,502 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
47
+ 2021-05-25 19:58:19,503 INFO __main__ Validation: accuracy = 0.8504901960784313, f1 = 0.8884826325411335
48
+ 2021-05-25 19:58:20,004 INFO torchdistill.misc.log Epoch: [4] [ 0/230] eta: 0:01:55 lr: 5.973913043478261e-06 sample/s: 8.038922893074167 loss: 0.0000 (0.0000) time: 0.5005 data: 0.0029 max mem: 8343
49
+ 2021-05-25 19:58:45,232 INFO torchdistill.misc.log Epoch: [4] [ 50/230] eta: 0:01:30 lr: 4.669565217391304e-06 sample/s: 8.065907185329344 loss: 0.0000 (0.1483) time: 0.5099 data: 0.0028 max mem: 8343
50
+ 2021-05-25 19:59:10,190 INFO torchdistill.misc.log Epoch: [4] [100/230] eta: 0:01:05 lr: 3.365217391304348e-06 sample/s: 8.05736566867782 loss: 0.0000 (0.1296) time: 0.5081 data: 0.0028 max mem: 8343
51
+ 2021-05-25 19:59:35,163 INFO torchdistill.misc.log Epoch: [4] [150/230] eta: 0:00:40 lr: 2.0608695652173915e-06 sample/s: 7.483834821575934 loss: 0.0000 (0.1036) time: 0.4983 data: 0.0026 max mem: 8343
52
+ 2021-05-25 19:59:59,755 INFO torchdistill.misc.log Epoch: [4] [200/230] eta: 0:00:14 lr: 7.565217391304349e-07 sample/s: 8.709057781053877 loss: 0.0000 (0.1062) time: 0.4910 data: 0.0027 max mem: 8343
53
+ 2021-05-25 20:00:13,848 INFO torchdistill.misc.log Epoch: [4] Total time: 0:01:54
54
+ 2021-05-25 20:00:17,666 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
55
+ 2021-05-25 20:00:17,666 INFO __main__ Validation: accuracy = 0.8799019607843137, f1 = 0.9162393162393162
56
+ 2021-05-25 20:00:17,666 INFO __main__ Updating ckpt
57
+ 2021-05-25 20:00:31,000 INFO __main__ [Student: bert-large-uncased]
58
+ 2021-05-25 20:00:34,825 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow
59
+ 2021-05-25 20:00:34,825 INFO __main__ Test: accuracy = 0.8799019607843137, f1 = 0.9162393162393162
60
+ 2021-05-25 20:00:34,826 INFO __main__ Start prediction for private dataset(s)
61
+ 2021-05-25 20:00:34,827 INFO __main__ mrpc/test: 1725 samples