dobbersc's picture
Add de2en and en2de models
c89209e verified
2024-07-29 04:43:06,934 ----------------------------------------------------------------------------------------------------
2024-07-29 04:43:06,934 Training Model
2024-07-29 04:43:06,934 ----------------------------------------------------------------------------------------------------
2024-07-29 04:43:06,934 Translator(
(encoder): EncoderLSTM(
(embedding): Embedding(114, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True)
)
(decoder): DecoderLSTM(
(embedding): Embedding(112, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True)
(attention): DotProductAttention(
(softmax): Softmax(dim=-1)
(combined2hidden): Sequential(
(0): Linear(in_features=1024, out_features=512, bias=True)
(1): ReLU()
)
)
(hidden2vocab): Linear(in_features=512, out_features=112, bias=True)
(log_softmax): LogSoftmax(dim=-1)
)
)
2024-07-29 04:43:06,934 ----------------------------------------------------------------------------------------------------
2024-07-29 04:43:06,934 Training Hyperparameters:
2024-07-29 04:43:06,934 - max_epochs: 10
2024-07-29 04:43:06,934 - learning_rate: 0.001
2024-07-29 04:43:06,934 - batch_size: 128
2024-07-29 04:43:06,934 - patience: 5
2024-07-29 04:43:06,934 - scheduler_patience: 3
2024-07-29 04:43:06,934 - teacher_forcing_ratio: 0.5
2024-07-29 04:43:06,934 ----------------------------------------------------------------------------------------------------
2024-07-29 04:43:06,934 Computational Parameters:
2024-07-29 04:43:06,934 - num_workers: 4
2024-07-29 04:43:06,934 - device: device(type='cuda', index=0)
2024-07-29 04:43:06,934 ----------------------------------------------------------------------------------------------------
2024-07-29 04:43:06,934 Dataset Splits:
2024-07-29 04:43:06,934 - train: 133623 data points
2024-07-29 04:43:06,934 - dev: 19090 data points
2024-07-29 04:43:06,934 - test: 38179 data points
2024-07-29 04:43:06,935 ----------------------------------------------------------------------------------------------------
2024-07-29 04:43:06,935 EPOCH 1
2024-07-29 04:46:03,502 batch 104/1044 - loss 2.83783023 - lr 0.0010 - time 176.57s
2024-07-29 04:48:58,358 batch 208/1044 - loss 2.67827428 - lr 0.0010 - time 351.42s
2024-07-29 04:52:09,047 batch 312/1044 - loss 2.59119082 - lr 0.0010 - time 542.11s
2024-07-29 04:55:23,591 batch 416/1044 - loss 2.52991555 - lr 0.0010 - time 736.66s
2024-07-29 04:58:24,345 batch 520/1044 - loss 2.48547669 - lr 0.0010 - time 917.41s
2024-07-29 05:01:11,473 batch 624/1044 - loss 2.44637715 - lr 0.0010 - time 1084.54s
2024-07-29 05:04:20,046 batch 728/1044 - loss 2.41217192 - lr 0.0010 - time 1273.11s
2024-07-29 05:07:28,110 batch 832/1044 - loss 2.37809223 - lr 0.0010 - time 1461.18s
2024-07-29 05:10:38,372 batch 936/1044 - loss 2.34602575 - lr 0.0010 - time 1651.44s
2024-07-29 05:13:32,549 batch 1040/1044 - loss 2.31563680 - lr 0.0010 - time 1825.61s
2024-07-29 05:13:39,106 ----------------------------------------------------------------------------------------------------
2024-07-29 05:13:39,108 EPOCH 1 DONE
2024-07-29 05:14:26,303 TRAIN Loss: 2.3144
2024-07-29 05:14:26,303 DEV Loss: 3.5700
2024-07-29 05:14:26,303 DEV Perplexity: 35.5166
2024-07-29 05:14:26,303 New best score!
2024-07-29 05:14:26,305 ----------------------------------------------------------------------------------------------------
2024-07-29 05:14:26,305 EPOCH 2
2024-07-29 05:17:25,271 batch 104/1044 - loss 2.02556723 - lr 0.0010 - time 178.97s
2024-07-29 05:20:25,054 batch 208/1044 - loss 2.00942771 - lr 0.0010 - time 358.75s
2024-07-29 05:23:12,883 batch 312/1044 - loss 1.99176520 - lr 0.0010 - time 526.58s
2024-07-29 05:26:08,804 batch 416/1044 - loss 1.97854575 - lr 0.0010 - time 702.50s
2024-07-29 05:29:14,936 batch 520/1044 - loss 1.97086978 - lr 0.0010 - time 888.63s
2024-07-29 05:32:21,237 batch 624/1044 - loss 1.95995870 - lr 0.0010 - time 1074.93s
2024-07-29 05:35:20,854 batch 728/1044 - loss 1.95067503 - lr 0.0010 - time 1254.55s
2024-07-29 05:38:34,956 batch 832/1044 - loss 1.94326082 - lr 0.0010 - time 1448.65s
2024-07-29 05:41:48,006 batch 936/1044 - loss 1.93362772 - lr 0.0010 - time 1641.70s
2024-07-29 05:44:42,067 batch 1040/1044 - loss 1.92524348 - lr 0.0010 - time 1815.76s
2024-07-29 05:44:50,207 ----------------------------------------------------------------------------------------------------
2024-07-29 05:44:50,210 EPOCH 2 DONE
2024-07-29 05:45:37,466 TRAIN Loss: 1.9249
2024-07-29 05:45:37,466 DEV Loss: 3.8374
2024-07-29 05:45:37,466 DEV Perplexity: 46.4067
2024-07-29 05:45:37,466 No improvement for 1 epoch(s)
2024-07-29 05:45:37,466 ----------------------------------------------------------------------------------------------------
2024-07-29 05:45:37,466 EPOCH 3
2024-07-29 05:48:43,560 batch 104/1044 - loss 1.82380688 - lr 0.0010 - time 186.09s
2024-07-29 05:51:53,714 batch 208/1044 - loss 1.82825828 - lr 0.0010 - time 376.25s
2024-07-29 05:55:08,715 batch 312/1044 - loss 1.82657076 - lr 0.0010 - time 571.25s
2024-07-29 05:58:07,203 batch 416/1044 - loss 1.82265144 - lr 0.0010 - time 749.74s
2024-07-29 06:00:58,968 batch 520/1044 - loss 1.81858461 - lr 0.0010 - time 921.50s
2024-07-29 06:03:59,822 batch 624/1044 - loss 1.80977892 - lr 0.0010 - time 1102.36s
2024-07-29 06:07:08,066 batch 728/1044 - loss 1.80312389 - lr 0.0010 - time 1290.60s
2024-07-29 06:10:01,948 batch 832/1044 - loss 1.79834272 - lr 0.0010 - time 1464.48s
2024-07-29 06:12:49,654 batch 936/1044 - loss 1.79244394 - lr 0.0010 - time 1632.19s
2024-07-29 06:15:41,378 batch 1040/1044 - loss 1.78895096 - lr 0.0010 - time 1803.91s
2024-07-29 06:15:47,180 ----------------------------------------------------------------------------------------------------
2024-07-29 06:15:47,183 EPOCH 3 DONE
2024-07-29 06:16:34,306 TRAIN Loss: 1.7889
2024-07-29 06:16:34,306 DEV Loss: 3.8489
2024-07-29 06:16:34,306 DEV Perplexity: 46.9422
2024-07-29 06:16:34,307 No improvement for 2 epoch(s)
2024-07-29 06:16:34,307 ----------------------------------------------------------------------------------------------------
2024-07-29 06:16:34,307 EPOCH 4
2024-07-29 06:19:47,695 batch 104/1044 - loss 1.72615880 - lr 0.0010 - time 193.39s
2024-07-29 06:22:47,789 batch 208/1044 - loss 1.72849645 - lr 0.0010 - time 373.48s
2024-07-29 06:25:49,316 batch 312/1044 - loss 1.72645533 - lr 0.0010 - time 555.01s
2024-07-29 06:28:43,932 batch 416/1044 - loss 1.72066385 - lr 0.0010 - time 729.63s
2024-07-29 06:31:56,479 batch 520/1044 - loss 1.71717779 - lr 0.0010 - time 922.17s
2024-07-29 06:34:57,754 batch 624/1044 - loss 1.71594436 - lr 0.0010 - time 1103.45s
2024-07-29 06:37:51,089 batch 728/1044 - loss 1.71165972 - lr 0.0010 - time 1276.78s
2024-07-29 06:40:52,402 batch 832/1044 - loss 1.70951752 - lr 0.0010 - time 1458.10s
2024-07-29 06:43:46,624 batch 936/1044 - loss 1.70553106 - lr 0.0010 - time 1632.32s
2024-07-29 06:46:41,386 batch 1040/1044 - loss 1.70329877 - lr 0.0010 - time 1807.08s
2024-07-29 06:46:48,093 ----------------------------------------------------------------------------------------------------
2024-07-29 06:46:48,095 EPOCH 4 DONE
2024-07-29 06:47:35,218 TRAIN Loss: 1.7032
2024-07-29 06:47:35,219 DEV Loss: 4.1957
2024-07-29 06:47:35,219 DEV Perplexity: 66.3981
2024-07-29 06:47:35,219 No improvement for 3 epoch(s)
2024-07-29 06:47:35,219 ----------------------------------------------------------------------------------------------------
2024-07-29 06:47:35,219 EPOCH 5
2024-07-29 06:50:45,524 batch 104/1044 - loss 1.64844567 - lr 0.0010 - time 190.31s
2024-07-29 06:53:48,606 batch 208/1044 - loss 1.64985944 - lr 0.0010 - time 373.39s
2024-07-29 06:56:52,667 batch 312/1044 - loss 1.65055201 - lr 0.0010 - time 557.45s
2024-07-29 06:59:51,714 batch 416/1044 - loss 1.65345511 - lr 0.0010 - time 736.50s
2024-07-29 07:02:52,445 batch 520/1044 - loss 1.65111495 - lr 0.0010 - time 917.23s
2024-07-29 07:06:00,096 batch 624/1044 - loss 1.65081866 - lr 0.0010 - time 1104.88s
2024-07-29 07:09:16,066 batch 728/1044 - loss 1.64957887 - lr 0.0010 - time 1300.85s
2024-07-29 07:12:15,087 batch 832/1044 - loss 1.64832800 - lr 0.0010 - time 1479.87s
2024-07-29 07:15:10,030 batch 936/1044 - loss 1.64612010 - lr 0.0010 - time 1654.81s
2024-07-29 07:18:02,140 batch 1040/1044 - loss 1.64496474 - lr 0.0010 - time 1826.92s
2024-07-29 07:18:08,591 ----------------------------------------------------------------------------------------------------
2024-07-29 07:18:08,594 EPOCH 5 DONE
2024-07-29 07:18:55,835 TRAIN Loss: 1.6448
2024-07-29 07:18:55,835 DEV Loss: 4.0923
2024-07-29 07:18:55,835 DEV Perplexity: 59.8790
2024-07-29 07:18:55,835 No improvement for 4 epoch(s)
2024-07-29 07:18:55,835 ----------------------------------------------------------------------------------------------------
2024-07-29 07:18:55,835 EPOCH 6
2024-07-29 07:21:53,160 batch 104/1044 - loss 1.58821843 - lr 0.0001 - time 177.32s
2024-07-29 07:24:44,349 batch 208/1044 - loss 1.59108787 - lr 0.0001 - time 348.51s
2024-07-29 07:27:37,622 batch 312/1044 - loss 1.58441215 - lr 0.0001 - time 521.79s
2024-07-29 07:30:43,750 batch 416/1044 - loss 1.58090937 - lr 0.0001 - time 707.91s
2024-07-29 07:33:54,621 batch 520/1044 - loss 1.58090223 - lr 0.0001 - time 898.79s
2024-07-29 07:36:52,832 batch 624/1044 - loss 1.58009594 - lr 0.0001 - time 1077.00s
2024-07-29 07:40:09,071 batch 728/1044 - loss 1.57836947 - lr 0.0001 - time 1273.24s
2024-07-29 07:43:11,085 batch 832/1044 - loss 1.57711583 - lr 0.0001 - time 1455.25s
2024-07-29 07:46:18,514 batch 936/1044 - loss 1.57624354 - lr 0.0001 - time 1642.68s
2024-07-29 07:49:05,093 batch 1040/1044 - loss 1.57536047 - lr 0.0001 - time 1809.26s
2024-07-29 07:49:11,696 ----------------------------------------------------------------------------------------------------
2024-07-29 07:49:11,699 EPOCH 6 DONE
2024-07-29 07:49:59,010 TRAIN Loss: 1.5752
2024-07-29 07:49:59,010 DEV Loss: 4.1991
2024-07-29 07:49:59,010 DEV Perplexity: 66.6274
2024-07-29 07:49:59,010 No improvement for 5 epoch(s)
2024-07-29 07:49:59,010 Patience reached: Terminating model training due to early stopping
2024-07-29 07:49:59,010 ----------------------------------------------------------------------------------------------------
2024-07-29 07:49:59,010 Finished Training
2024-07-29 07:51:31,366 TEST Perplexity: 35.5327
2024-07-29 08:02:43,738 TEST BLEU = 4.47 45.6/8.8/2.0/0.5 (BP = 1.000 ratio = 1.000 hyp_len = 103 ref_len = 103)