2023-10-16 20:18:22,608 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,609 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-16 20:18:22,609 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,609 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-16 20:18:22,609 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,609 Train: 1085 sentences 2023-10-16 20:18:22,609 (train_with_dev=False, train_with_test=False) 2023-10-16 20:18:22,609 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,610 Training Params: 2023-10-16 20:18:22,610 - learning_rate: "5e-05" 2023-10-16 20:18:22,610 - mini_batch_size: "8" 2023-10-16 20:18:22,610 - max_epochs: "10" 2023-10-16 20:18:22,610 - shuffle: "True" 2023-10-16 20:18:22,610 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,610 Plugins: 2023-10-16 20:18:22,610 - LinearScheduler | warmup_fraction: '0.1' 2023-10-16 20:18:22,610 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,610 Final evaluation on model from best epoch (best-model.pt) 2023-10-16 20:18:22,610 - metric: "('micro avg', 'f1-score')" 2023-10-16 20:18:22,610 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,610 Computation: 2023-10-16 20:18:22,610 - compute on device: cuda:0 2023-10-16 20:18:22,610 - embedding storage: none 2023-10-16 20:18:22,610 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,610 Model training base path: "hmbench-newseye/sv-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-4" 2023-10-16 20:18:22,610 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:22,610 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:23,900 epoch 1 - iter 13/136 - loss 2.93136957 - time (sec): 1.29 - samples/sec: 3687.97 - lr: 0.000004 - momentum: 0.000000 2023-10-16 20:18:25,583 epoch 1 - iter 26/136 - loss 2.61761304 - time (sec): 2.97 - samples/sec: 3560.79 - lr: 0.000009 - momentum: 0.000000 2023-10-16 20:18:27,189 epoch 1 - iter 39/136 - loss 2.10292372 - time (sec): 4.58 - samples/sec: 3480.42 - lr: 0.000014 - momentum: 0.000000 2023-10-16 20:18:28,654 epoch 1 - iter 52/136 - loss 1.69932419 - time (sec): 6.04 - samples/sec: 3541.03 - lr: 0.000019 - momentum: 0.000000 2023-10-16 20:18:30,218 epoch 1 - iter 65/136 - loss 1.46083257 - time (sec): 7.61 - samples/sec: 3509.89 - lr: 0.000024 - momentum: 0.000000 2023-10-16 20:18:31,384 epoch 1 - iter 78/136 - loss 1.32435059 - time (sec): 8.77 - samples/sec: 3550.71 - lr: 0.000028 - momentum: 0.000000 2023-10-16 20:18:32,960 epoch 1 - iter 91/136 - loss 1.19352850 - time (sec): 10.35 - samples/sec: 3485.30 - lr: 0.000033 - momentum: 0.000000 2023-10-16 20:18:34,200 epoch 1 - iter 104/136 - loss 1.10278115 - time (sec): 11.59 - samples/sec: 3501.34 - lr: 0.000038 - momentum: 0.000000 2023-10-16 20:18:35,599 epoch 1 - iter 117/136 - loss 1.01082586 - time (sec): 12.99 - samples/sec: 3499.00 - lr: 0.000043 - momentum: 0.000000 2023-10-16 20:18:36,934 epoch 1 - iter 130/136 - loss 0.93871330 - time (sec): 14.32 - samples/sec: 3480.74 - lr: 0.000047 - momentum: 0.000000 2023-10-16 20:18:37,489 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:37,489 EPOCH 1 done: loss 0.9142 - lr: 0.000047 2023-10-16 20:18:38,542 DEV : loss 0.17539678514003754 - f1-score (micro avg) 0.6643 2023-10-16 20:18:38,546 saving best model 2023-10-16 20:18:38,883 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:40,211 epoch 2 - iter 13/136 - loss 0.22301634 - time (sec): 1.33 - samples/sec: 3488.45 - lr: 0.000050 - momentum: 0.000000 2023-10-16 20:18:41,522 epoch 2 - iter 26/136 - loss 0.18431598 - time (sec): 2.64 - samples/sec: 3631.46 - lr: 0.000049 - momentum: 0.000000 2023-10-16 20:18:42,978 epoch 2 - iter 39/136 - loss 0.16140995 - time (sec): 4.09 - samples/sec: 3598.49 - lr: 0.000048 - momentum: 0.000000 2023-10-16 20:18:44,206 epoch 2 - iter 52/136 - loss 0.17264568 - time (sec): 5.32 - samples/sec: 3718.67 - lr: 0.000048 - momentum: 0.000000 2023-10-16 20:18:45,431 epoch 2 - iter 65/136 - loss 0.18654532 - time (sec): 6.55 - samples/sec: 3641.40 - lr: 0.000047 - momentum: 0.000000 2023-10-16 20:18:46,877 epoch 2 - iter 78/136 - loss 0.18635291 - time (sec): 7.99 - samples/sec: 3574.94 - lr: 0.000047 - momentum: 0.000000 2023-10-16 20:18:48,524 epoch 2 - iter 91/136 - loss 0.17676252 - time (sec): 9.64 - samples/sec: 3538.90 - lr: 0.000046 - momentum: 0.000000 2023-10-16 20:18:49,939 epoch 2 - iter 104/136 - loss 0.16965661 - time (sec): 11.05 - samples/sec: 3571.79 - lr: 0.000046 - momentum: 0.000000 2023-10-16 20:18:51,488 epoch 2 - iter 117/136 - loss 0.16802195 - time (sec): 12.60 - samples/sec: 3573.08 - lr: 0.000045 - momentum: 0.000000 2023-10-16 20:18:52,961 epoch 2 - iter 130/136 - loss 0.16391174 - time (sec): 14.08 - samples/sec: 3553.33 - lr: 0.000045 - momentum: 0.000000 2023-10-16 20:18:53,453 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:53,453 EPOCH 2 done: loss 0.1653 - lr: 0.000045 2023-10-16 20:18:54,968 DEV : loss 0.14248891174793243 - f1-score (micro avg) 0.691 2023-10-16 20:18:54,972 saving best model 2023-10-16 20:18:55,435 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:18:56,982 epoch 3 - iter 13/136 - loss 0.10095872 - time (sec): 1.55 - samples/sec: 3543.51 - lr: 0.000044 - momentum: 0.000000 2023-10-16 20:18:58,419 epoch 3 - iter 26/136 - loss 0.10478173 - time (sec): 2.98 - samples/sec: 3660.03 - lr: 0.000043 - momentum: 0.000000 2023-10-16 20:18:59,853 epoch 3 - iter 39/136 - loss 0.09423993 - time (sec): 4.42 - samples/sec: 3552.60 - lr: 0.000043 - momentum: 0.000000 2023-10-16 20:19:01,429 epoch 3 - iter 52/136 - loss 0.09587897 - time (sec): 5.99 - samples/sec: 3509.06 - lr: 0.000042 - momentum: 0.000000 2023-10-16 20:19:02,600 epoch 3 - iter 65/136 - loss 0.09997083 - time (sec): 7.16 - samples/sec: 3492.72 - lr: 0.000042 - momentum: 0.000000 2023-10-16 20:19:03,950 epoch 3 - iter 78/136 - loss 0.09751295 - time (sec): 8.51 - samples/sec: 3489.66 - lr: 0.000041 - momentum: 0.000000 2023-10-16 20:19:05,435 epoch 3 - iter 91/136 - loss 0.09486572 - time (sec): 10.00 - samples/sec: 3532.44 - lr: 0.000041 - momentum: 0.000000 2023-10-16 20:19:06,726 epoch 3 - iter 104/136 - loss 0.09199902 - time (sec): 11.29 - samples/sec: 3559.17 - lr: 0.000040 - momentum: 0.000000 2023-10-16 20:19:08,133 epoch 3 - iter 117/136 - loss 0.08768794 - time (sec): 12.70 - samples/sec: 3551.74 - lr: 0.000040 - momentum: 0.000000 2023-10-16 20:19:09,588 epoch 3 - iter 130/136 - loss 0.08763564 - time (sec): 14.15 - samples/sec: 3526.52 - lr: 0.000039 - momentum: 0.000000 2023-10-16 20:19:10,164 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:19:10,165 EPOCH 3 done: loss 0.0866 - lr: 0.000039 2023-10-16 20:19:11,832 DEV : loss 0.10237669199705124 - f1-score (micro avg) 0.8268 2023-10-16 20:19:11,836 saving best model 2023-10-16 20:19:12,290 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:19:13,716 epoch 4 - iter 13/136 - loss 0.08431575 - time (sec): 1.42 - samples/sec: 3427.07 - lr: 0.000038 - momentum: 0.000000 2023-10-16 20:19:15,238 epoch 4 - iter 26/136 - loss 0.06713900 - time (sec): 2.95 - samples/sec: 3482.32 - lr: 0.000038 - momentum: 0.000000 2023-10-16 20:19:16,482 epoch 4 - iter 39/136 - loss 0.05765410 - time (sec): 4.19 - samples/sec: 3551.75 - lr: 0.000037 - momentum: 0.000000 2023-10-16 20:19:17,877 epoch 4 - iter 52/136 - loss 0.05665901 - time (sec): 5.58 - samples/sec: 3682.64 - lr: 0.000037 - momentum: 0.000000 2023-10-16 20:19:19,212 epoch 4 - iter 65/136 - loss 0.05979582 - time (sec): 6.92 - samples/sec: 3637.82 - lr: 0.000036 - momentum: 0.000000 2023-10-16 20:19:20,790 epoch 4 - iter 78/136 - loss 0.05790229 - time (sec): 8.50 - samples/sec: 3623.54 - lr: 0.000036 - momentum: 0.000000 2023-10-16 20:19:22,120 epoch 4 - iter 91/136 - loss 0.05693213 - time (sec): 9.83 - samples/sec: 3606.14 - lr: 0.000035 - momentum: 0.000000 2023-10-16 20:19:23,630 epoch 4 - iter 104/136 - loss 0.05628982 - time (sec): 11.34 - samples/sec: 3579.50 - lr: 0.000035 - momentum: 0.000000 2023-10-16 20:19:25,016 epoch 4 - iter 117/136 - loss 0.05352644 - time (sec): 12.72 - samples/sec: 3590.41 - lr: 0.000034 - momentum: 0.000000 2023-10-16 20:19:26,241 epoch 4 - iter 130/136 - loss 0.05384621 - time (sec): 13.95 - samples/sec: 3567.78 - lr: 0.000034 - momentum: 0.000000 2023-10-16 20:19:26,852 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:19:26,852 EPOCH 4 done: loss 0.0531 - lr: 0.000034 2023-10-16 20:19:28,342 DEV : loss 0.1108362078666687 - f1-score (micro avg) 0.792 2023-10-16 20:19:28,347 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:19:29,788 epoch 5 - iter 13/136 - loss 0.04268788 - time (sec): 1.44 - samples/sec: 3396.75 - lr: 0.000033 - momentum: 0.000000 2023-10-16 20:19:31,041 epoch 5 - iter 26/136 - loss 0.03670220 - time (sec): 2.69 - samples/sec: 3419.76 - lr: 0.000032 - momentum: 0.000000 2023-10-16 20:19:32,272 epoch 5 - iter 39/136 - loss 0.04010108 - time (sec): 3.92 - samples/sec: 3646.72 - lr: 0.000032 - momentum: 0.000000 2023-10-16 20:19:33,713 epoch 5 - iter 52/136 - loss 0.04362300 - time (sec): 5.37 - samples/sec: 3515.55 - lr: 0.000031 - momentum: 0.000000 2023-10-16 20:19:35,275 epoch 5 - iter 65/136 - loss 0.03800478 - time (sec): 6.93 - samples/sec: 3481.28 - lr: 0.000031 - momentum: 0.000000 2023-10-16 20:19:36,599 epoch 5 - iter 78/136 - loss 0.03549079 - time (sec): 8.25 - samples/sec: 3546.96 - lr: 0.000030 - momentum: 0.000000 2023-10-16 20:19:37,986 epoch 5 - iter 91/136 - loss 0.03458614 - time (sec): 9.64 - samples/sec: 3543.08 - lr: 0.000030 - momentum: 0.000000 2023-10-16 20:19:39,580 epoch 5 - iter 104/136 - loss 0.03257155 - time (sec): 11.23 - samples/sec: 3549.79 - lr: 0.000029 - momentum: 0.000000 2023-10-16 20:19:40,937 epoch 5 - iter 117/136 - loss 0.03505263 - time (sec): 12.59 - samples/sec: 3533.61 - lr: 0.000029 - momentum: 0.000000 2023-10-16 20:19:42,362 epoch 5 - iter 130/136 - loss 0.03522794 - time (sec): 14.01 - samples/sec: 3540.70 - lr: 0.000028 - momentum: 0.000000 2023-10-16 20:19:43,035 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:19:43,036 EPOCH 5 done: loss 0.0344 - lr: 0.000028 2023-10-16 20:19:44,757 DEV : loss 0.12420879304409027 - f1-score (micro avg) 0.8015 2023-10-16 20:19:44,762 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:19:46,094 epoch 6 - iter 13/136 - loss 0.01916100 - time (sec): 1.33 - samples/sec: 3326.19 - lr: 0.000027 - momentum: 0.000000 2023-10-16 20:19:47,511 epoch 6 - iter 26/136 - loss 0.02420091 - time (sec): 2.75 - samples/sec: 3358.03 - lr: 0.000027 - momentum: 0.000000 2023-10-16 20:19:49,067 epoch 6 - iter 39/136 - loss 0.02128442 - time (sec): 4.30 - samples/sec: 3389.40 - lr: 0.000026 - momentum: 0.000000 2023-10-16 20:19:50,714 epoch 6 - iter 52/136 - loss 0.02244990 - time (sec): 5.95 - samples/sec: 3469.10 - lr: 0.000026 - momentum: 0.000000 2023-10-16 20:19:52,211 epoch 6 - iter 65/136 - loss 0.02199431 - time (sec): 7.45 - samples/sec: 3475.38 - lr: 0.000025 - momentum: 0.000000 2023-10-16 20:19:53,518 epoch 6 - iter 78/136 - loss 0.02434853 - time (sec): 8.76 - samples/sec: 3477.60 - lr: 0.000025 - momentum: 0.000000 2023-10-16 20:19:54,879 epoch 6 - iter 91/136 - loss 0.02513952 - time (sec): 10.12 - samples/sec: 3445.86 - lr: 0.000024 - momentum: 0.000000 2023-10-16 20:19:56,241 epoch 6 - iter 104/136 - loss 0.02469020 - time (sec): 11.48 - samples/sec: 3465.60 - lr: 0.000024 - momentum: 0.000000 2023-10-16 20:19:57,761 epoch 6 - iter 117/136 - loss 0.02336871 - time (sec): 13.00 - samples/sec: 3473.17 - lr: 0.000023 - momentum: 0.000000 2023-10-16 20:19:59,055 epoch 6 - iter 130/136 - loss 0.02303322 - time (sec): 14.29 - samples/sec: 3518.84 - lr: 0.000023 - momentum: 0.000000 2023-10-16 20:19:59,504 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:19:59,505 EPOCH 6 done: loss 0.0235 - lr: 0.000023 2023-10-16 20:20:01,009 DEV : loss 0.1299697607755661 - f1-score (micro avg) 0.82 2023-10-16 20:20:01,015 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:20:02,378 epoch 7 - iter 13/136 - loss 0.02201266 - time (sec): 1.36 - samples/sec: 3644.27 - lr: 0.000022 - momentum: 0.000000 2023-10-16 20:20:04,027 epoch 7 - iter 26/136 - loss 0.01585502 - time (sec): 3.01 - samples/sec: 3607.62 - lr: 0.000021 - momentum: 0.000000 2023-10-16 20:20:05,485 epoch 7 - iter 39/136 - loss 0.01813674 - time (sec): 4.47 - samples/sec: 3493.68 - lr: 0.000021 - momentum: 0.000000 2023-10-16 20:20:07,060 epoch 7 - iter 52/136 - loss 0.01977797 - time (sec): 6.04 - samples/sec: 3551.44 - lr: 0.000020 - momentum: 0.000000 2023-10-16 20:20:08,369 epoch 7 - iter 65/136 - loss 0.01878455 - time (sec): 7.35 - samples/sec: 3549.15 - lr: 0.000020 - momentum: 0.000000 2023-10-16 20:20:09,893 epoch 7 - iter 78/136 - loss 0.01847453 - time (sec): 8.88 - samples/sec: 3555.26 - lr: 0.000019 - momentum: 0.000000 2023-10-16 20:20:11,172 epoch 7 - iter 91/136 - loss 0.01767757 - time (sec): 10.16 - samples/sec: 3562.64 - lr: 0.000019 - momentum: 0.000000 2023-10-16 20:20:12,435 epoch 7 - iter 104/136 - loss 0.01671361 - time (sec): 11.42 - samples/sec: 3581.10 - lr: 0.000018 - momentum: 0.000000 2023-10-16 20:20:13,763 epoch 7 - iter 117/136 - loss 0.01655957 - time (sec): 12.75 - samples/sec: 3559.24 - lr: 0.000018 - momentum: 0.000000 2023-10-16 20:20:15,116 epoch 7 - iter 130/136 - loss 0.01641038 - time (sec): 14.10 - samples/sec: 3556.12 - lr: 0.000017 - momentum: 0.000000 2023-10-16 20:20:15,632 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:20:15,632 EPOCH 7 done: loss 0.0160 - lr: 0.000017 2023-10-16 20:20:17,148 DEV : loss 0.13152331113815308 - f1-score (micro avg) 0.8315 2023-10-16 20:20:17,154 saving best model 2023-10-16 20:20:17,607 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:20:19,128 epoch 8 - iter 13/136 - loss 0.01402109 - time (sec): 1.52 - samples/sec: 2768.32 - lr: 0.000016 - momentum: 0.000000 2023-10-16 20:20:20,447 epoch 8 - iter 26/136 - loss 0.01902129 - time (sec): 2.84 - samples/sec: 3178.52 - lr: 0.000016 - momentum: 0.000000 2023-10-16 20:20:21,607 epoch 8 - iter 39/136 - loss 0.01608610 - time (sec): 4.00 - samples/sec: 3256.67 - lr: 0.000015 - momentum: 0.000000 2023-10-16 20:20:23,057 epoch 8 - iter 52/136 - loss 0.01326992 - time (sec): 5.45 - samples/sec: 3408.57 - lr: 0.000015 - momentum: 0.000000 2023-10-16 20:20:24,346 epoch 8 - iter 65/136 - loss 0.01198353 - time (sec): 6.74 - samples/sec: 3474.97 - lr: 0.000014 - momentum: 0.000000 2023-10-16 20:20:25,975 epoch 8 - iter 78/136 - loss 0.01138949 - time (sec): 8.37 - samples/sec: 3455.20 - lr: 0.000014 - momentum: 0.000000 2023-10-16 20:20:27,579 epoch 8 - iter 91/136 - loss 0.01277453 - time (sec): 9.97 - samples/sec: 3455.56 - lr: 0.000013 - momentum: 0.000000 2023-10-16 20:20:28,873 epoch 8 - iter 104/136 - loss 0.01291802 - time (sec): 11.26 - samples/sec: 3477.15 - lr: 0.000013 - momentum: 0.000000 2023-10-16 20:20:30,352 epoch 8 - iter 117/136 - loss 0.01246003 - time (sec): 12.74 - samples/sec: 3471.33 - lr: 0.000012 - momentum: 0.000000 2023-10-16 20:20:31,831 epoch 8 - iter 130/136 - loss 0.01146785 - time (sec): 14.22 - samples/sec: 3491.93 - lr: 0.000012 - momentum: 0.000000 2023-10-16 20:20:32,488 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:20:32,488 EPOCH 8 done: loss 0.0123 - lr: 0.000012 2023-10-16 20:20:33,976 DEV : loss 0.15120868384838104 - f1-score (micro avg) 0.8192 2023-10-16 20:20:33,981 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:20:35,606 epoch 9 - iter 13/136 - loss 0.00940542 - time (sec): 1.62 - samples/sec: 3592.17 - lr: 0.000011 - momentum: 0.000000 2023-10-16 20:20:36,941 epoch 9 - iter 26/136 - loss 0.00670356 - time (sec): 2.96 - samples/sec: 3633.67 - lr: 0.000010 - momentum: 0.000000 2023-10-16 20:20:38,165 epoch 9 - iter 39/136 - loss 0.00947777 - time (sec): 4.18 - samples/sec: 3542.21 - lr: 0.000010 - momentum: 0.000000 2023-10-16 20:20:39,685 epoch 9 - iter 52/136 - loss 0.00891002 - time (sec): 5.70 - samples/sec: 3538.63 - lr: 0.000009 - momentum: 0.000000 2023-10-16 20:20:41,079 epoch 9 - iter 65/136 - loss 0.00918508 - time (sec): 7.10 - samples/sec: 3529.78 - lr: 0.000009 - momentum: 0.000000 2023-10-16 20:20:42,396 epoch 9 - iter 78/136 - loss 0.00934079 - time (sec): 8.41 - samples/sec: 3592.02 - lr: 0.000008 - momentum: 0.000000 2023-10-16 20:20:43,945 epoch 9 - iter 91/136 - loss 0.00873731 - time (sec): 9.96 - samples/sec: 3549.63 - lr: 0.000008 - momentum: 0.000000 2023-10-16 20:20:45,515 epoch 9 - iter 104/136 - loss 0.00839287 - time (sec): 11.53 - samples/sec: 3560.14 - lr: 0.000007 - momentum: 0.000000 2023-10-16 20:20:46,878 epoch 9 - iter 117/136 - loss 0.00857232 - time (sec): 12.90 - samples/sec: 3579.75 - lr: 0.000007 - momentum: 0.000000 2023-10-16 20:20:48,163 epoch 9 - iter 130/136 - loss 0.01058054 - time (sec): 14.18 - samples/sec: 3549.23 - lr: 0.000006 - momentum: 0.000000 2023-10-16 20:20:48,734 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:20:48,735 EPOCH 9 done: loss 0.0106 - lr: 0.000006 2023-10-16 20:20:50,191 DEV : loss 0.14968341588974 - f1-score (micro avg) 0.8175 2023-10-16 20:20:50,195 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:20:51,766 epoch 10 - iter 13/136 - loss 0.00201920 - time (sec): 1.57 - samples/sec: 3363.38 - lr: 0.000005 - momentum: 0.000000 2023-10-16 20:20:52,880 epoch 10 - iter 26/136 - loss 0.00257635 - time (sec): 2.68 - samples/sec: 3418.54 - lr: 0.000005 - momentum: 0.000000 2023-10-16 20:20:54,377 epoch 10 - iter 39/136 - loss 0.00284480 - time (sec): 4.18 - samples/sec: 3228.03 - lr: 0.000004 - momentum: 0.000000 2023-10-16 20:20:55,815 epoch 10 - iter 52/136 - loss 0.00540897 - time (sec): 5.62 - samples/sec: 3338.60 - lr: 0.000004 - momentum: 0.000000 2023-10-16 20:20:57,292 epoch 10 - iter 65/136 - loss 0.00496823 - time (sec): 7.10 - samples/sec: 3352.13 - lr: 0.000003 - momentum: 0.000000 2023-10-16 20:20:58,618 epoch 10 - iter 78/136 - loss 0.00534676 - time (sec): 8.42 - samples/sec: 3351.23 - lr: 0.000003 - momentum: 0.000000 2023-10-16 20:21:00,051 epoch 10 - iter 91/136 - loss 0.00574925 - time (sec): 9.85 - samples/sec: 3392.70 - lr: 0.000002 - momentum: 0.000000 2023-10-16 20:21:01,333 epoch 10 - iter 104/136 - loss 0.00664022 - time (sec): 11.14 - samples/sec: 3429.83 - lr: 0.000002 - momentum: 0.000000 2023-10-16 20:21:02,990 epoch 10 - iter 117/136 - loss 0.00799959 - time (sec): 12.79 - samples/sec: 3422.62 - lr: 0.000001 - momentum: 0.000000 2023-10-16 20:21:04,396 epoch 10 - iter 130/136 - loss 0.00826072 - time (sec): 14.20 - samples/sec: 3483.83 - lr: 0.000000 - momentum: 0.000000 2023-10-16 20:21:05,155 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:21:05,155 EPOCH 10 done: loss 0.0080 - lr: 0.000000 2023-10-16 20:21:06,676 DEV : loss 0.15046021342277527 - f1-score (micro avg) 0.8324 2023-10-16 20:21:06,681 saving best model 2023-10-16 20:21:07,513 ---------------------------------------------------------------------------------------------------- 2023-10-16 20:21:07,514 Loading model from best epoch ... 2023-10-16 20:21:09,210 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-16 20:21:11,380 Results: - F-score (micro) 0.7767 - F-score (macro) 0.7354 - Accuracy 0.6543 By class: precision recall f1-score support LOC 0.8112 0.8814 0.8449 312 PER 0.6628 0.8221 0.7339 208 ORG 0.5306 0.4727 0.5000 55 HumanProd 0.7586 1.0000 0.8627 22 micro avg 0.7319 0.8275 0.7767 597 macro avg 0.6908 0.7941 0.7354 597 weighted avg 0.7317 0.8275 0.7751 597 2023-10-16 20:21:11,380 ----------------------------------------------------------------------------------------------------