dobbersc's picture
Add de2en and en2de models
c89209e verified
2024-07-29 10:41:51,295 ----------------------------------------------------------------------------------------------------
2024-07-29 10:41:51,295 Training Model
2024-07-29 10:41:51,295 ----------------------------------------------------------------------------------------------------
2024-07-29 10:41:51,295 Translator(
(encoder): EncoderLSTM(
(embedding): Embedding(14303, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True, bidirectional=True)
)
(decoder): DecoderLSTM(
(embedding): Embedding(22834, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 1024, batch_first=True)
(hidden2vocab): Linear(in_features=1024, out_features=22834, bias=True)
(log_softmax): LogSoftmax(dim=-1)
)
)
2024-07-29 10:41:51,295 ----------------------------------------------------------------------------------------------------
2024-07-29 10:41:51,295 Training Hyperparameters:
2024-07-29 10:41:51,295 - max_epochs: 10
2024-07-29 10:41:51,295 - learning_rate: 0.001
2024-07-29 10:41:51,295 - batch_size: 128
2024-07-29 10:41:51,295 - patience: 5
2024-07-29 10:41:51,295 - scheduler_patience: 3
2024-07-29 10:41:51,295 - teacher_forcing_ratio: 0.5
2024-07-29 10:41:51,295 ----------------------------------------------------------------------------------------------------
2024-07-29 10:41:51,295 Computational Parameters:
2024-07-29 10:41:51,295 - num_workers: 4
2024-07-29 10:41:51,295 - device: device(type='cuda', index=0)
2024-07-29 10:41:51,295 ----------------------------------------------------------------------------------------------------
2024-07-29 10:41:51,295 Dataset Splits:
2024-07-29 10:41:51,295 - train: 133623 data points
2024-07-29 10:41:51,295 - dev: 19090 data points
2024-07-29 10:41:51,296 - test: 38179 data points
2024-07-29 10:41:51,296 ----------------------------------------------------------------------------------------------------
2024-07-29 10:41:51,296 EPOCH 1
2024-07-29 10:42:43,980 batch 104/1044 - loss 6.56599054 - lr 0.0010 - time 52.68s
2024-07-29 10:43:35,196 batch 208/1044 - loss 6.28009422 - lr 0.0010 - time 103.90s
2024-07-29 10:44:25,168 batch 312/1044 - loss 6.11249907 - lr 0.0010 - time 153.87s
2024-07-29 10:45:15,557 batch 416/1044 - loss 5.99013720 - lr 0.0010 - time 204.26s
2024-07-29 10:46:02,970 batch 520/1044 - loss 5.89236221 - lr 0.0010 - time 251.67s
2024-07-29 10:46:51,664 batch 624/1044 - loss 5.81345889 - lr 0.0010 - time 300.37s
2024-07-29 10:47:42,555 batch 728/1044 - loss 5.74780520 - lr 0.0010 - time 351.26s
2024-07-29 10:48:33,483 batch 832/1044 - loss 5.69103370 - lr 0.0010 - time 402.19s
2024-07-29 10:49:22,573 batch 936/1044 - loss 5.63910694 - lr 0.0010 - time 451.28s
2024-07-29 10:50:14,318 batch 1040/1044 - loss 5.59255154 - lr 0.0010 - time 503.02s
2024-07-29 10:50:16,234 ----------------------------------------------------------------------------------------------------
2024-07-29 10:50:16,235 EPOCH 1 DONE
2024-07-29 10:50:29,064 TRAIN Loss: 5.5908
2024-07-29 10:50:29,064 DEV Loss: 5.7897
2024-07-29 10:50:29,064 DEV Perplexity: 326.8995
2024-07-29 10:50:29,064 New best score!
2024-07-29 10:50:29,065 ----------------------------------------------------------------------------------------------------
2024-07-29 10:50:29,065 EPOCH 2
2024-07-29 10:51:19,942 batch 104/1044 - loss 5.02739687 - lr 0.0010 - time 50.88s
2024-07-29 10:52:09,803 batch 208/1044 - loss 5.01800949 - lr 0.0010 - time 100.74s
2024-07-29 10:53:04,478 batch 312/1044 - loss 5.00509294 - lr 0.0010 - time 155.41s
2024-07-29 10:53:53,594 batch 416/1044 - loss 4.98731034 - lr 0.0010 - time 204.53s
2024-07-29 10:54:43,356 batch 520/1044 - loss 4.97219816 - lr 0.0010 - time 254.29s
2024-07-29 10:55:33,584 batch 624/1044 - loss 4.96074294 - lr 0.0010 - time 304.52s
2024-07-29 10:56:24,225 batch 728/1044 - loss 4.94472581 - lr 0.0010 - time 355.16s
2024-07-29 10:57:14,355 batch 832/1044 - loss 4.93236568 - lr 0.0010 - time 405.29s
2024-07-29 10:58:06,416 batch 936/1044 - loss 4.91768116 - lr 0.0010 - time 457.35s
2024-07-29 10:58:55,326 batch 1040/1044 - loss 4.90590350 - lr 0.0010 - time 506.26s
2024-07-29 10:58:57,793 ----------------------------------------------------------------------------------------------------
2024-07-29 10:58:57,794 EPOCH 2 DONE
2024-07-29 10:59:10,716 TRAIN Loss: 4.9057
2024-07-29 10:59:10,716 DEV Loss: 5.7132
2024-07-29 10:59:10,716 DEV Perplexity: 302.8460
2024-07-29 10:59:10,716 New best score!
2024-07-29 10:59:10,717 ----------------------------------------------------------------------------------------------------
2024-07-29 10:59:10,717 EPOCH 3
2024-07-29 11:00:00,967 batch 104/1044 - loss 4.61237117 - lr 0.0010 - time 50.25s
2024-07-29 11:00:50,203 batch 208/1044 - loss 4.62395983 - lr 0.0010 - time 99.49s
2024-07-29 11:01:40,773 batch 312/1044 - loss 4.61521491 - lr 0.0010 - time 150.06s
2024-07-29 11:02:39,135 batch 416/1044 - loss 4.61224452 - lr 0.0010 - time 208.42s
2024-07-29 11:03:30,115 batch 520/1044 - loss 4.60275617 - lr 0.0010 - time 259.40s
2024-07-29 11:04:16,479 batch 624/1044 - loss 4.59871728 - lr 0.0010 - time 305.76s
2024-07-29 11:05:07,213 batch 728/1044 - loss 4.59086315 - lr 0.0010 - time 356.50s
2024-07-29 11:05:57,731 batch 832/1044 - loss 4.58489406 - lr 0.0010 - time 407.01s
2024-07-29 11:06:46,315 batch 936/1044 - loss 4.57758889 - lr 0.0010 - time 455.60s
2024-07-29 11:07:34,550 batch 1040/1044 - loss 4.56970717 - lr 0.0010 - time 503.83s
2024-07-29 11:07:36,805 ----------------------------------------------------------------------------------------------------
2024-07-29 11:07:36,806 EPOCH 3 DONE
2024-07-29 11:07:49,727 TRAIN Loss: 4.5697
2024-07-29 11:07:49,728 DEV Loss: 5.5772
2024-07-29 11:07:49,728 DEV Perplexity: 264.3216
2024-07-29 11:07:49,728 New best score!
2024-07-29 11:07:49,729 ----------------------------------------------------------------------------------------------------
2024-07-29 11:07:49,729 EPOCH 4
2024-07-29 11:08:37,495 batch 104/1044 - loss 4.31793029 - lr 0.0010 - time 47.77s
2024-07-29 11:09:28,126 batch 208/1044 - loss 4.31731233 - lr 0.0010 - time 98.40s
2024-07-29 11:10:17,772 batch 312/1044 - loss 4.31730254 - lr 0.0010 - time 148.04s
2024-07-29 11:11:11,397 batch 416/1044 - loss 4.31653301 - lr 0.0010 - time 201.67s
2024-07-29 11:12:01,968 batch 520/1044 - loss 4.32179287 - lr 0.0010 - time 252.24s
2024-07-29 11:12:52,660 batch 624/1044 - loss 4.32694693 - lr 0.0010 - time 302.93s
2024-07-29 11:13:42,592 batch 728/1044 - loss 4.32466568 - lr 0.0010 - time 352.86s
2024-07-29 11:14:33,972 batch 832/1044 - loss 4.32141261 - lr 0.0010 - time 404.24s
2024-07-29 11:15:25,165 batch 936/1044 - loss 4.31979928 - lr 0.0010 - time 455.44s
2024-07-29 11:16:12,909 batch 1040/1044 - loss 4.31766206 - lr 0.0010 - time 503.18s
2024-07-29 11:16:14,722 ----------------------------------------------------------------------------------------------------
2024-07-29 11:16:14,723 EPOCH 4 DONE
2024-07-29 11:16:27,582 TRAIN Loss: 4.3179
2024-07-29 11:16:27,583 DEV Loss: 5.5061
2024-07-29 11:16:27,583 DEV Perplexity: 246.1892
2024-07-29 11:16:27,583 New best score!
2024-07-29 11:16:27,584 ----------------------------------------------------------------------------------------------------
2024-07-29 11:16:27,584 EPOCH 5
2024-07-29 11:17:19,390 batch 104/1044 - loss 4.10156497 - lr 0.0010 - time 51.81s
2024-07-29 11:18:08,736 batch 208/1044 - loss 4.09617276 - lr 0.0010 - time 101.15s
2024-07-29 11:18:57,718 batch 312/1044 - loss 4.10813874 - lr 0.0010 - time 150.13s
2024-07-29 11:19:53,482 batch 416/1044 - loss 4.11702962 - lr 0.0010 - time 205.90s
2024-07-29 11:20:43,773 batch 520/1044 - loss 4.11525546 - lr 0.0010 - time 256.19s
2024-07-29 11:21:34,723 batch 624/1044 - loss 4.11790551 - lr 0.0010 - time 307.14s
2024-07-29 11:22:23,462 batch 728/1044 - loss 4.12154044 - lr 0.0010 - time 355.88s
2024-07-29 11:23:11,115 batch 832/1044 - loss 4.12138260 - lr 0.0010 - time 403.53s
2024-07-29 11:24:00,337 batch 936/1044 - loss 4.12506736 - lr 0.0010 - time 452.75s
2024-07-29 11:24:50,964 batch 1040/1044 - loss 4.12429898 - lr 0.0010 - time 503.38s
2024-07-29 11:24:52,983 ----------------------------------------------------------------------------------------------------
2024-07-29 11:24:52,984 EPOCH 5 DONE
2024-07-29 11:25:05,723 TRAIN Loss: 4.1247
2024-07-29 11:25:05,723 DEV Loss: 5.4289
2024-07-29 11:25:05,723 DEV Perplexity: 227.8912
2024-07-29 11:25:05,723 New best score!
2024-07-29 11:25:05,724 ----------------------------------------------------------------------------------------------------
2024-07-29 11:25:05,724 EPOCH 6
2024-07-29 11:25:59,338 batch 104/1044 - loss 3.89071036 - lr 0.0010 - time 53.61s
2024-07-29 11:26:50,131 batch 208/1044 - loss 3.91066583 - lr 0.0010 - time 104.41s
2024-07-29 11:27:39,605 batch 312/1044 - loss 3.92001536 - lr 0.0010 - time 153.88s
2024-07-29 11:28:26,705 batch 416/1044 - loss 3.91852045 - lr 0.0010 - time 200.98s
2024-07-29 11:29:21,163 batch 520/1044 - loss 3.92671625 - lr 0.0010 - time 255.44s
2024-07-29 11:30:09,942 batch 624/1044 - loss 3.93454336 - lr 0.0010 - time 304.22s
2024-07-29 11:31:02,918 batch 728/1044 - loss 3.94077764 - lr 0.0010 - time 357.19s
2024-07-29 11:31:53,528 batch 832/1044 - loss 3.94676249 - lr 0.0010 - time 407.80s
2024-07-29 11:32:41,961 batch 936/1044 - loss 3.95203299 - lr 0.0010 - time 456.24s
2024-07-29 11:33:31,394 batch 1040/1044 - loss 3.95468071 - lr 0.0010 - time 505.67s
2024-07-29 11:33:33,260 ----------------------------------------------------------------------------------------------------
2024-07-29 11:33:33,261 EPOCH 6 DONE
2024-07-29 11:33:46,131 TRAIN Loss: 3.9546
2024-07-29 11:33:46,131 DEV Loss: 5.4532
2024-07-29 11:33:46,131 DEV Perplexity: 233.4940
2024-07-29 11:33:46,131 No improvement for 1 epoch(s)
2024-07-29 11:33:46,131 ----------------------------------------------------------------------------------------------------
2024-07-29 11:33:46,131 EPOCH 7
2024-07-29 11:34:36,037 batch 104/1044 - loss 3.75202120 - lr 0.0010 - time 49.91s
2024-07-29 11:35:23,211 batch 208/1044 - loss 3.76428310 - lr 0.0010 - time 97.08s
2024-07-29 11:36:15,737 batch 312/1044 - loss 3.76220069 - lr 0.0010 - time 149.61s
2024-07-29 11:37:06,599 batch 416/1044 - loss 3.76866076 - lr 0.0010 - time 200.47s
2024-07-29 11:37:57,102 batch 520/1044 - loss 3.78008501 - lr 0.0010 - time 250.97s
2024-07-29 11:38:48,470 batch 624/1044 - loss 3.78899940 - lr 0.0010 - time 302.34s
2024-07-29 11:39:37,561 batch 728/1044 - loss 3.79675758 - lr 0.0010 - time 351.43s
2024-07-29 11:40:26,884 batch 832/1044 - loss 3.80079628 - lr 0.0010 - time 400.75s
2024-07-29 11:41:15,559 batch 936/1044 - loss 3.80748021 - lr 0.0010 - time 449.43s
2024-07-29 11:42:04,567 batch 1040/1044 - loss 3.81313193 - lr 0.0010 - time 498.44s
2024-07-29 11:42:06,383 ----------------------------------------------------------------------------------------------------
2024-07-29 11:42:06,384 EPOCH 7 DONE
2024-07-29 11:42:19,326 TRAIN Loss: 3.8132
2024-07-29 11:42:19,326 DEV Loss: 5.4459
2024-07-29 11:42:19,326 DEV Perplexity: 231.8110
2024-07-29 11:42:19,326 No improvement for 2 epoch(s)
2024-07-29 11:42:19,326 ----------------------------------------------------------------------------------------------------
2024-07-29 11:42:19,326 EPOCH 8
2024-07-29 11:43:11,878 batch 104/1044 - loss 3.64161499 - lr 0.0010 - time 52.55s
2024-07-29 11:44:02,812 batch 208/1044 - loss 3.66078973 - lr 0.0010 - time 103.49s
2024-07-29 11:44:54,581 batch 312/1044 - loss 3.66940373 - lr 0.0010 - time 155.26s
2024-07-29 11:45:42,502 batch 416/1044 - loss 3.67283917 - lr 0.0010 - time 203.18s
2024-07-29 11:46:32,748 batch 520/1044 - loss 3.67896443 - lr 0.0010 - time 253.42s
2024-07-29 11:47:19,611 batch 624/1044 - loss 3.68378819 - lr 0.0010 - time 300.28s
2024-07-29 11:48:12,844 batch 728/1044 - loss 3.68957532 - lr 0.0010 - time 353.52s
2024-07-29 11:49:01,503 batch 832/1044 - loss 3.69448218 - lr 0.0010 - time 402.18s
2024-07-29 11:49:51,030 batch 936/1044 - loss 3.70412089 - lr 0.0010 - time 451.70s
2024-07-29 11:50:42,780 batch 1040/1044 - loss 3.70785985 - lr 0.0010 - time 503.45s
2024-07-29 11:50:44,516 ----------------------------------------------------------------------------------------------------
2024-07-29 11:50:44,517 EPOCH 8 DONE
2024-07-29 11:50:57,332 TRAIN Loss: 3.7082
2024-07-29 11:50:57,332 DEV Loss: 5.4909
2024-07-29 11:50:57,332 DEV Perplexity: 242.4722
2024-07-29 11:50:57,332 No improvement for 3 epoch(s)
2024-07-29 11:50:57,332 ----------------------------------------------------------------------------------------------------
2024-07-29 11:50:57,332 EPOCH 9
2024-07-29 11:51:48,335 batch 104/1044 - loss 3.51649693 - lr 0.0010 - time 51.00s
2024-07-29 11:52:36,237 batch 208/1044 - loss 3.53223671 - lr 0.0010 - time 98.90s
2024-07-29 11:53:24,958 batch 312/1044 - loss 3.54294675 - lr 0.0010 - time 147.63s
2024-07-29 11:54:15,450 batch 416/1044 - loss 3.55195141 - lr 0.0010 - time 198.12s
2024-07-29 11:55:05,536 batch 520/1044 - loss 3.55877144 - lr 0.0010 - time 248.20s
2024-07-29 11:56:00,052 batch 624/1044 - loss 3.56457711 - lr 0.0010 - time 302.72s
2024-07-29 11:56:51,205 batch 728/1044 - loss 3.57586517 - lr 0.0010 - time 353.87s
2024-07-29 11:57:39,310 batch 832/1044 - loss 3.58044331 - lr 0.0010 - time 401.98s
2024-07-29 11:58:31,929 batch 936/1044 - loss 3.58557119 - lr 0.0010 - time 454.60s
2024-07-29 11:59:20,796 batch 1040/1044 - loss 3.59259140 - lr 0.0010 - time 503.46s
2024-07-29 11:59:22,537 ----------------------------------------------------------------------------------------------------
2024-07-29 11:59:22,537 EPOCH 9 DONE
2024-07-29 11:59:35,430 TRAIN Loss: 3.5929
2024-07-29 11:59:35,430 DEV Loss: 5.5051
2024-07-29 11:59:35,430 DEV Perplexity: 245.9499
2024-07-29 11:59:35,430 No improvement for 4 epoch(s)
2024-07-29 11:59:35,430 ----------------------------------------------------------------------------------------------------
2024-07-29 11:59:35,430 EPOCH 10
2024-07-29 12:00:26,025 batch 104/1044 - loss 3.40724007 - lr 0.0001 - time 50.59s
2024-07-29 12:01:13,195 batch 208/1044 - loss 3.39549031 - lr 0.0001 - time 97.76s
2024-07-29 12:02:02,790 batch 312/1044 - loss 3.38177447 - lr 0.0001 - time 147.36s
2024-07-29 12:02:54,277 batch 416/1044 - loss 3.37592916 - lr 0.0001 - time 198.85s
2024-07-29 12:03:45,463 batch 520/1044 - loss 3.37005038 - lr 0.0001 - time 250.03s
2024-07-29 12:04:33,863 batch 624/1044 - loss 3.37062476 - lr 0.0001 - time 298.43s
2024-07-29 12:05:26,246 batch 728/1044 - loss 3.37177335 - lr 0.0001 - time 350.82s
2024-07-29 12:06:15,720 batch 832/1044 - loss 3.37051947 - lr 0.0001 - time 400.29s
2024-07-29 12:07:04,090 batch 936/1044 - loss 3.36859261 - lr 0.0001 - time 448.66s
2024-07-29 12:07:56,434 batch 1040/1044 - loss 3.36913985 - lr 0.0001 - time 501.00s
2024-07-29 12:07:58,775 ----------------------------------------------------------------------------------------------------
2024-07-29 12:07:58,776 EPOCH 10 DONE
2024-07-29 12:08:11,643 TRAIN Loss: 3.3688
2024-07-29 12:08:11,644 DEV Loss: 5.5078
2024-07-29 12:08:11,644 DEV Perplexity: 246.6117
2024-07-29 12:08:11,644 No improvement for 5 epoch(s)
2024-07-29 12:08:11,644 Patience reached: Terminating model training due to early stopping
2024-07-29 12:08:11,644 ----------------------------------------------------------------------------------------------------
2024-07-29 12:08:11,644 Finished Training
2024-07-29 12:08:36,837 TEST Perplexity: 227.3162
2024-07-29 12:11:57,875 TEST BLEU = 12.94 77.1/52.4/11.1/0.6 (BP = 1.000 ratio = 1.000 hyp_len = 83 ref_len = 83)