2023-10-24 17:53:07,606 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,607 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-24 17:53:07,607 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,607 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /home/ubuntu/.flair/datasets/ner_icdar_europeana/fr 2023-10-24 17:53:07,607 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,607 Train: 7936 sentences 2023-10-24 17:53:07,607 (train_with_dev=False, train_with_test=False) 2023-10-24 17:53:07,607 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,607 Training Params: 2023-10-24 17:53:07,607 - learning_rate: "5e-05" 2023-10-24 17:53:07,607 - mini_batch_size: "8" 2023-10-24 17:53:07,607 - max_epochs: "10" 2023-10-24 17:53:07,607 - shuffle: "True" 2023-10-24 17:53:07,607 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,607 Plugins: 2023-10-24 17:53:07,607 - TensorboardLogger 2023-10-24 17:53:07,607 - LinearScheduler | warmup_fraction: '0.1' 2023-10-24 17:53:07,607 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,607 Final evaluation on model from best epoch (best-model.pt) 2023-10-24 17:53:07,608 - metric: "('micro avg', 'f1-score')" 2023-10-24 17:53:07,608 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,608 Computation: 2023-10-24 17:53:07,608 - compute on device: cuda:0 2023-10-24 17:53:07,608 - embedding storage: none 2023-10-24 17:53:07,608 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,608 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-3" 2023-10-24 17:53:07,608 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,608 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:53:07,608 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-24 17:53:16,110 epoch 1 - iter 99/992 - loss 1.45019353 - time (sec): 8.50 - samples/sec: 2052.11 - lr: 0.000005 - momentum: 0.000000 2023-10-24 17:53:24,479 epoch 1 - iter 198/992 - loss 0.90651188 - time (sec): 16.87 - samples/sec: 1995.74 - lr: 0.000010 - momentum: 0.000000 2023-10-24 17:53:32,526 epoch 1 - iter 297/992 - loss 0.68242155 - time (sec): 24.92 - samples/sec: 1970.50 - lr: 0.000015 - momentum: 0.000000 2023-10-24 17:53:40,903 epoch 1 - iter 396/992 - loss 0.55232472 - time (sec): 33.29 - samples/sec: 1971.25 - lr: 0.000020 - momentum: 0.000000 2023-10-24 17:53:49,010 epoch 1 - iter 495/992 - loss 0.47511537 - time (sec): 41.40 - samples/sec: 1964.36 - lr: 0.000025 - momentum: 0.000000 2023-10-24 17:53:57,155 epoch 1 - iter 594/992 - loss 0.42026811 - time (sec): 49.55 - samples/sec: 1960.16 - lr: 0.000030 - momentum: 0.000000 2023-10-24 17:54:05,777 epoch 1 - iter 693/992 - loss 0.37516782 - time (sec): 58.17 - samples/sec: 1957.77 - lr: 0.000035 - momentum: 0.000000 2023-10-24 17:54:14,242 epoch 1 - iter 792/992 - loss 0.34277188 - time (sec): 66.63 - samples/sec: 1955.19 - lr: 0.000040 - momentum: 0.000000 2023-10-24 17:54:22,644 epoch 1 - iter 891/992 - loss 0.32033942 - time (sec): 75.04 - samples/sec: 1963.07 - lr: 0.000045 - momentum: 0.000000 2023-10-24 17:54:31,054 epoch 1 - iter 990/992 - loss 0.30228680 - time (sec): 83.45 - samples/sec: 1960.73 - lr: 0.000050 - momentum: 0.000000 2023-10-24 17:54:31,234 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:54:31,235 EPOCH 1 done: loss 0.3019 - lr: 0.000050 2023-10-24 17:54:34,306 DEV : loss 0.08691307157278061 - f1-score (micro avg) 0.7201 2023-10-24 17:54:34,321 saving best model 2023-10-24 17:54:34,791 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:54:42,930 epoch 2 - iter 99/992 - loss 0.09527419 - time (sec): 8.14 - samples/sec: 2003.53 - lr: 0.000049 - momentum: 0.000000 2023-10-24 17:54:51,239 epoch 2 - iter 198/992 - loss 0.09556154 - time (sec): 16.45 - samples/sec: 1975.24 - lr: 0.000049 - momentum: 0.000000 2023-10-24 17:54:59,412 epoch 2 - iter 297/992 - loss 0.09905757 - time (sec): 24.62 - samples/sec: 1984.65 - lr: 0.000048 - momentum: 0.000000 2023-10-24 17:55:07,962 epoch 2 - iter 396/992 - loss 0.10241445 - time (sec): 33.17 - samples/sec: 1977.76 - lr: 0.000048 - momentum: 0.000000 2023-10-24 17:55:16,323 epoch 2 - iter 495/992 - loss 0.10157096 - time (sec): 41.53 - samples/sec: 1984.19 - lr: 0.000047 - momentum: 0.000000 2023-10-24 17:55:24,569 epoch 2 - iter 594/992 - loss 0.10195348 - time (sec): 49.78 - samples/sec: 1983.29 - lr: 0.000047 - momentum: 0.000000 2023-10-24 17:55:33,030 epoch 2 - iter 693/992 - loss 0.10065037 - time (sec): 58.24 - samples/sec: 1983.87 - lr: 0.000046 - momentum: 0.000000 2023-10-24 17:55:41,369 epoch 2 - iter 792/992 - loss 0.09922714 - time (sec): 66.58 - samples/sec: 1970.91 - lr: 0.000046 - momentum: 0.000000 2023-10-24 17:55:49,709 epoch 2 - iter 891/992 - loss 0.09994013 - time (sec): 74.92 - samples/sec: 1964.70 - lr: 0.000045 - momentum: 0.000000 2023-10-24 17:55:58,196 epoch 2 - iter 990/992 - loss 0.10114388 - time (sec): 83.40 - samples/sec: 1963.04 - lr: 0.000044 - momentum: 0.000000 2023-10-24 17:55:58,341 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:55:58,341 EPOCH 2 done: loss 0.1011 - lr: 0.000044 2023-10-24 17:56:01,444 DEV : loss 0.09098362177610397 - f1-score (micro avg) 0.743 2023-10-24 17:56:01,459 saving best model 2023-10-24 17:56:02,049 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:56:10,259 epoch 3 - iter 99/992 - loss 0.06165896 - time (sec): 8.21 - samples/sec: 1971.22 - lr: 0.000044 - momentum: 0.000000 2023-10-24 17:56:18,758 epoch 3 - iter 198/992 - loss 0.06593462 - time (sec): 16.71 - samples/sec: 1971.74 - lr: 0.000043 - momentum: 0.000000 2023-10-24 17:56:27,234 epoch 3 - iter 297/992 - loss 0.07089123 - time (sec): 25.18 - samples/sec: 1940.51 - lr: 0.000043 - momentum: 0.000000 2023-10-24 17:56:35,331 epoch 3 - iter 396/992 - loss 0.06905276 - time (sec): 33.28 - samples/sec: 1946.67 - lr: 0.000042 - momentum: 0.000000 2023-10-24 17:56:44,066 epoch 3 - iter 495/992 - loss 0.06665018 - time (sec): 42.02 - samples/sec: 1961.30 - lr: 0.000042 - momentum: 0.000000 2023-10-24 17:56:52,473 epoch 3 - iter 594/992 - loss 0.06883315 - time (sec): 50.42 - samples/sec: 1959.12 - lr: 0.000041 - momentum: 0.000000 2023-10-24 17:57:00,614 epoch 3 - iter 693/992 - loss 0.06972989 - time (sec): 58.56 - samples/sec: 1959.41 - lr: 0.000041 - momentum: 0.000000 2023-10-24 17:57:09,003 epoch 3 - iter 792/992 - loss 0.06920701 - time (sec): 66.95 - samples/sec: 1961.79 - lr: 0.000040 - momentum: 0.000000 2023-10-24 17:57:17,427 epoch 3 - iter 891/992 - loss 0.06875715 - time (sec): 75.38 - samples/sec: 1961.18 - lr: 0.000039 - momentum: 0.000000 2023-10-24 17:57:25,554 epoch 3 - iter 990/992 - loss 0.06866266 - time (sec): 83.50 - samples/sec: 1960.90 - lr: 0.000039 - momentum: 0.000000 2023-10-24 17:57:25,711 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:57:25,711 EPOCH 3 done: loss 0.0686 - lr: 0.000039 2023-10-24 17:57:28,825 DEV : loss 0.10797995328903198 - f1-score (micro avg) 0.7225 2023-10-24 17:57:28,840 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:57:36,940 epoch 4 - iter 99/992 - loss 0.04262884 - time (sec): 8.10 - samples/sec: 1951.86 - lr: 0.000038 - momentum: 0.000000 2023-10-24 17:57:45,681 epoch 4 - iter 198/992 - loss 0.04995344 - time (sec): 16.84 - samples/sec: 1947.00 - lr: 0.000038 - momentum: 0.000000 2023-10-24 17:57:54,126 epoch 4 - iter 297/992 - loss 0.04925014 - time (sec): 25.28 - samples/sec: 1947.25 - lr: 0.000037 - momentum: 0.000000 2023-10-24 17:58:02,267 epoch 4 - iter 396/992 - loss 0.05068279 - time (sec): 33.43 - samples/sec: 1948.76 - lr: 0.000037 - momentum: 0.000000 2023-10-24 17:58:10,475 epoch 4 - iter 495/992 - loss 0.05043456 - time (sec): 41.63 - samples/sec: 1960.24 - lr: 0.000036 - momentum: 0.000000 2023-10-24 17:58:18,115 epoch 4 - iter 594/992 - loss 0.04948632 - time (sec): 49.27 - samples/sec: 1955.28 - lr: 0.000036 - momentum: 0.000000 2023-10-24 17:58:26,645 epoch 4 - iter 693/992 - loss 0.05029241 - time (sec): 57.80 - samples/sec: 1963.09 - lr: 0.000035 - momentum: 0.000000 2023-10-24 17:58:35,036 epoch 4 - iter 792/992 - loss 0.05054238 - time (sec): 66.19 - samples/sec: 1959.75 - lr: 0.000034 - momentum: 0.000000 2023-10-24 17:58:43,142 epoch 4 - iter 891/992 - loss 0.04973429 - time (sec): 74.30 - samples/sec: 1968.56 - lr: 0.000034 - momentum: 0.000000 2023-10-24 17:58:52,072 epoch 4 - iter 990/992 - loss 0.04918421 - time (sec): 83.23 - samples/sec: 1966.28 - lr: 0.000033 - momentum: 0.000000 2023-10-24 17:58:52,222 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:58:52,222 EPOCH 4 done: loss 0.0491 - lr: 0.000033 2023-10-24 17:58:55,339 DEV : loss 0.16018341481685638 - f1-score (micro avg) 0.7368 2023-10-24 17:58:55,354 ---------------------------------------------------------------------------------------------------- 2023-10-24 17:59:03,855 epoch 5 - iter 99/992 - loss 0.03275354 - time (sec): 8.50 - samples/sec: 1998.54 - lr: 0.000033 - momentum: 0.000000 2023-10-24 17:59:12,128 epoch 5 - iter 198/992 - loss 0.03475824 - time (sec): 16.77 - samples/sec: 1968.33 - lr: 0.000032 - momentum: 0.000000 2023-10-24 17:59:20,968 epoch 5 - iter 297/992 - loss 0.03421394 - time (sec): 25.61 - samples/sec: 1935.93 - lr: 0.000032 - momentum: 0.000000 2023-10-24 17:59:29,192 epoch 5 - iter 396/992 - loss 0.03443327 - time (sec): 33.84 - samples/sec: 1928.20 - lr: 0.000031 - momentum: 0.000000 2023-10-24 17:59:37,436 epoch 5 - iter 495/992 - loss 0.03822480 - time (sec): 42.08 - samples/sec: 1946.88 - lr: 0.000031 - momentum: 0.000000 2023-10-24 17:59:45,453 epoch 5 - iter 594/992 - loss 0.03666575 - time (sec): 50.10 - samples/sec: 1952.70 - lr: 0.000030 - momentum: 0.000000 2023-10-24 17:59:54,159 epoch 5 - iter 693/992 - loss 0.03753706 - time (sec): 58.80 - samples/sec: 1951.41 - lr: 0.000029 - momentum: 0.000000 2023-10-24 18:00:02,511 epoch 5 - iter 792/992 - loss 0.03848741 - time (sec): 67.16 - samples/sec: 1952.21 - lr: 0.000029 - momentum: 0.000000 2023-10-24 18:00:10,602 epoch 5 - iter 891/992 - loss 0.03850444 - time (sec): 75.25 - samples/sec: 1953.08 - lr: 0.000028 - momentum: 0.000000 2023-10-24 18:00:19,094 epoch 5 - iter 990/992 - loss 0.03746891 - time (sec): 83.74 - samples/sec: 1954.15 - lr: 0.000028 - momentum: 0.000000 2023-10-24 18:00:19,260 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:00:19,260 EPOCH 5 done: loss 0.0374 - lr: 0.000028 2023-10-24 18:00:22,383 DEV : loss 0.17979347705841064 - f1-score (micro avg) 0.7377 2023-10-24 18:00:22,399 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:00:30,989 epoch 6 - iter 99/992 - loss 0.02323895 - time (sec): 8.59 - samples/sec: 1890.29 - lr: 0.000027 - momentum: 0.000000 2023-10-24 18:00:39,421 epoch 6 - iter 198/992 - loss 0.02299469 - time (sec): 17.02 - samples/sec: 1940.04 - lr: 0.000027 - momentum: 0.000000 2023-10-24 18:00:47,699 epoch 6 - iter 297/992 - loss 0.02440000 - time (sec): 25.30 - samples/sec: 1959.56 - lr: 0.000026 - momentum: 0.000000 2023-10-24 18:00:55,811 epoch 6 - iter 396/992 - loss 0.02517086 - time (sec): 33.41 - samples/sec: 1970.95 - lr: 0.000026 - momentum: 0.000000 2023-10-24 18:01:04,347 epoch 6 - iter 495/992 - loss 0.02699649 - time (sec): 41.95 - samples/sec: 1970.74 - lr: 0.000025 - momentum: 0.000000 2023-10-24 18:01:12,638 epoch 6 - iter 594/992 - loss 0.02691355 - time (sec): 50.24 - samples/sec: 1963.22 - lr: 0.000024 - momentum: 0.000000 2023-10-24 18:01:20,840 epoch 6 - iter 693/992 - loss 0.02767072 - time (sec): 58.44 - samples/sec: 1957.76 - lr: 0.000024 - momentum: 0.000000 2023-10-24 18:01:29,206 epoch 6 - iter 792/992 - loss 0.02671390 - time (sec): 66.81 - samples/sec: 1956.90 - lr: 0.000023 - momentum: 0.000000 2023-10-24 18:01:37,458 epoch 6 - iter 891/992 - loss 0.02828160 - time (sec): 75.06 - samples/sec: 1948.57 - lr: 0.000023 - momentum: 0.000000 2023-10-24 18:01:45,697 epoch 6 - iter 990/992 - loss 0.02810958 - time (sec): 83.30 - samples/sec: 1965.05 - lr: 0.000022 - momentum: 0.000000 2023-10-24 18:01:45,857 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:01:45,857 EPOCH 6 done: loss 0.0281 - lr: 0.000022 2023-10-24 18:01:48,981 DEV : loss 0.18152180314064026 - f1-score (micro avg) 0.7691 2023-10-24 18:01:48,996 saving best model 2023-10-24 18:01:49,629 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:01:58,369 epoch 7 - iter 99/992 - loss 0.02452345 - time (sec): 8.74 - samples/sec: 1920.30 - lr: 0.000022 - momentum: 0.000000 2023-10-24 18:02:06,462 epoch 7 - iter 198/992 - loss 0.02583170 - time (sec): 16.83 - samples/sec: 1928.62 - lr: 0.000021 - momentum: 0.000000 2023-10-24 18:02:15,105 epoch 7 - iter 297/992 - loss 0.02293195 - time (sec): 25.48 - samples/sec: 1913.00 - lr: 0.000021 - momentum: 0.000000 2023-10-24 18:02:23,523 epoch 7 - iter 396/992 - loss 0.01981722 - time (sec): 33.89 - samples/sec: 1901.12 - lr: 0.000020 - momentum: 0.000000 2023-10-24 18:02:31,697 epoch 7 - iter 495/992 - loss 0.01985616 - time (sec): 42.07 - samples/sec: 1910.50 - lr: 0.000019 - momentum: 0.000000 2023-10-24 18:02:40,392 epoch 7 - iter 594/992 - loss 0.01977100 - time (sec): 50.76 - samples/sec: 1926.84 - lr: 0.000019 - momentum: 0.000000 2023-10-24 18:02:48,921 epoch 7 - iter 693/992 - loss 0.02041123 - time (sec): 59.29 - samples/sec: 1935.23 - lr: 0.000018 - momentum: 0.000000 2023-10-24 18:02:57,131 epoch 7 - iter 792/992 - loss 0.02070652 - time (sec): 67.50 - samples/sec: 1941.00 - lr: 0.000018 - momentum: 0.000000 2023-10-24 18:03:05,223 epoch 7 - iter 891/992 - loss 0.02081829 - time (sec): 75.59 - samples/sec: 1949.40 - lr: 0.000017 - momentum: 0.000000 2023-10-24 18:03:13,362 epoch 7 - iter 990/992 - loss 0.02144692 - time (sec): 83.73 - samples/sec: 1952.78 - lr: 0.000017 - momentum: 0.000000 2023-10-24 18:03:13,536 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:03:13,536 EPOCH 7 done: loss 0.0214 - lr: 0.000017 2023-10-24 18:03:16,649 DEV : loss 0.18771061301231384 - f1-score (micro avg) 0.7667 2023-10-24 18:03:16,664 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:03:25,249 epoch 8 - iter 99/992 - loss 0.01897961 - time (sec): 8.58 - samples/sec: 2021.42 - lr: 0.000016 - momentum: 0.000000 2023-10-24 18:03:33,938 epoch 8 - iter 198/992 - loss 0.01571900 - time (sec): 17.27 - samples/sec: 1977.75 - lr: 0.000016 - momentum: 0.000000 2023-10-24 18:03:42,080 epoch 8 - iter 297/992 - loss 0.01464608 - time (sec): 25.41 - samples/sec: 1955.74 - lr: 0.000015 - momentum: 0.000000 2023-10-24 18:03:50,487 epoch 8 - iter 396/992 - loss 0.01451842 - time (sec): 33.82 - samples/sec: 1945.87 - lr: 0.000014 - momentum: 0.000000 2023-10-24 18:03:58,546 epoch 8 - iter 495/992 - loss 0.01464158 - time (sec): 41.88 - samples/sec: 1950.62 - lr: 0.000014 - momentum: 0.000000 2023-10-24 18:04:07,019 epoch 8 - iter 594/992 - loss 0.01494271 - time (sec): 50.35 - samples/sec: 1962.71 - lr: 0.000013 - momentum: 0.000000 2023-10-24 18:04:15,324 epoch 8 - iter 693/992 - loss 0.01427937 - time (sec): 58.66 - samples/sec: 1965.49 - lr: 0.000013 - momentum: 0.000000 2023-10-24 18:04:23,141 epoch 8 - iter 792/992 - loss 0.01444238 - time (sec): 66.48 - samples/sec: 1961.96 - lr: 0.000012 - momentum: 0.000000 2023-10-24 18:04:31,590 epoch 8 - iter 891/992 - loss 0.01470428 - time (sec): 74.93 - samples/sec: 1960.87 - lr: 0.000012 - momentum: 0.000000 2023-10-24 18:04:39,955 epoch 8 - iter 990/992 - loss 0.01468421 - time (sec): 83.29 - samples/sec: 1964.56 - lr: 0.000011 - momentum: 0.000000 2023-10-24 18:04:40,104 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:04:40,104 EPOCH 8 done: loss 0.0147 - lr: 0.000011 2023-10-24 18:04:43,222 DEV : loss 0.2224731296300888 - f1-score (micro avg) 0.7444 2023-10-24 18:04:43,237 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:04:51,724 epoch 9 - iter 99/992 - loss 0.01508641 - time (sec): 8.49 - samples/sec: 1869.36 - lr: 0.000011 - momentum: 0.000000 2023-10-24 18:04:59,936 epoch 9 - iter 198/992 - loss 0.01126348 - time (sec): 16.70 - samples/sec: 1893.49 - lr: 0.000010 - momentum: 0.000000 2023-10-24 18:05:08,057 epoch 9 - iter 297/992 - loss 0.01011817 - time (sec): 24.82 - samples/sec: 1903.84 - lr: 0.000009 - momentum: 0.000000 2023-10-24 18:05:17,194 epoch 9 - iter 396/992 - loss 0.01014998 - time (sec): 33.96 - samples/sec: 1904.20 - lr: 0.000009 - momentum: 0.000000 2023-10-24 18:05:25,870 epoch 9 - iter 495/992 - loss 0.00901112 - time (sec): 42.63 - samples/sec: 1918.22 - lr: 0.000008 - momentum: 0.000000 2023-10-24 18:05:34,447 epoch 9 - iter 594/992 - loss 0.00957614 - time (sec): 51.21 - samples/sec: 1921.71 - lr: 0.000008 - momentum: 0.000000 2023-10-24 18:05:42,470 epoch 9 - iter 693/992 - loss 0.00985237 - time (sec): 59.23 - samples/sec: 1932.18 - lr: 0.000007 - momentum: 0.000000 2023-10-24 18:05:50,719 epoch 9 - iter 792/992 - loss 0.00966421 - time (sec): 67.48 - samples/sec: 1935.77 - lr: 0.000007 - momentum: 0.000000 2023-10-24 18:05:58,741 epoch 9 - iter 891/992 - loss 0.00952795 - time (sec): 75.50 - samples/sec: 1944.93 - lr: 0.000006 - momentum: 0.000000 2023-10-24 18:06:06,945 epoch 9 - iter 990/992 - loss 0.00951350 - time (sec): 83.71 - samples/sec: 1955.66 - lr: 0.000006 - momentum: 0.000000 2023-10-24 18:06:07,091 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:06:07,092 EPOCH 9 done: loss 0.0095 - lr: 0.000006 2023-10-24 18:06:10,221 DEV : loss 0.2356439083814621 - f1-score (micro avg) 0.7551 2023-10-24 18:06:10,236 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:06:18,255 epoch 10 - iter 99/992 - loss 0.00471428 - time (sec): 8.02 - samples/sec: 2021.65 - lr: 0.000005 - momentum: 0.000000 2023-10-24 18:06:26,499 epoch 10 - iter 198/992 - loss 0.00492676 - time (sec): 16.26 - samples/sec: 1988.79 - lr: 0.000004 - momentum: 0.000000 2023-10-24 18:06:34,960 epoch 10 - iter 297/992 - loss 0.00537611 - time (sec): 24.72 - samples/sec: 1985.92 - lr: 0.000004 - momentum: 0.000000 2023-10-24 18:06:43,429 epoch 10 - iter 396/992 - loss 0.00591290 - time (sec): 33.19 - samples/sec: 1993.67 - lr: 0.000003 - momentum: 0.000000 2023-10-24 18:06:51,659 epoch 10 - iter 495/992 - loss 0.00619826 - time (sec): 41.42 - samples/sec: 1987.82 - lr: 0.000003 - momentum: 0.000000 2023-10-24 18:07:00,038 epoch 10 - iter 594/992 - loss 0.00579102 - time (sec): 49.80 - samples/sec: 1972.80 - lr: 0.000002 - momentum: 0.000000 2023-10-24 18:07:08,440 epoch 10 - iter 693/992 - loss 0.00584032 - time (sec): 58.20 - samples/sec: 1968.97 - lr: 0.000002 - momentum: 0.000000 2023-10-24 18:07:16,506 epoch 10 - iter 792/992 - loss 0.00552839 - time (sec): 66.27 - samples/sec: 1964.77 - lr: 0.000001 - momentum: 0.000000 2023-10-24 18:07:25,019 epoch 10 - iter 891/992 - loss 0.00572974 - time (sec): 74.78 - samples/sec: 1962.97 - lr: 0.000001 - momentum: 0.000000 2023-10-24 18:07:33,501 epoch 10 - iter 990/992 - loss 0.00560021 - time (sec): 83.26 - samples/sec: 1965.24 - lr: 0.000000 - momentum: 0.000000 2023-10-24 18:07:33,670 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:07:33,671 EPOCH 10 done: loss 0.0056 - lr: 0.000000 2023-10-24 18:07:36,792 DEV : loss 0.24207349121570587 - f1-score (micro avg) 0.7541 2023-10-24 18:07:37,277 ---------------------------------------------------------------------------------------------------- 2023-10-24 18:07:37,277 Loading model from best epoch ... 2023-10-24 18:07:39,090 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-24 18:07:41,834 Results: - F-score (micro) 0.7721 - F-score (macro) 0.6822 - Accuracy 0.6487 By class: precision recall f1-score support LOC 0.8067 0.8473 0.8265 655 PER 0.6980 0.7982 0.7448 223 ORG 0.6400 0.3780 0.4752 127 micro avg 0.7672 0.7771 0.7721 1005 macro avg 0.7149 0.6745 0.6822 1005 weighted avg 0.7615 0.7771 0.7640 1005 2023-10-24 18:07:41,834 ----------------------------------------------------------------------------------------------------