dobbersc's picture
Add de2en and en2de models
c89209e verified
2024-07-29 09:42:42,423 ----------------------------------------------------------------------------------------------------
2024-07-29 09:42:42,423 Training Model
2024-07-29 09:42:42,423 ----------------------------------------------------------------------------------------------------
2024-07-29 09:42:42,423 Translator(
(encoder): EncoderLSTM(
(embedding): Embedding(22834, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True)
)
(decoder): DecoderLSTM(
(embedding): Embedding(14303, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True)
(attention): DotProductAttention(
(softmax): Softmax(dim=-1)
(combined2hidden): Sequential(
(0): Linear(in_features=1024, out_features=512, bias=True)
(1): ReLU()
)
)
(hidden2vocab): Linear(in_features=512, out_features=14303, bias=True)
(log_softmax): LogSoftmax(dim=-1)
)
)
2024-07-29 09:42:42,423 ----------------------------------------------------------------------------------------------------
2024-07-29 09:42:42,423 Training Hyperparameters:
2024-07-29 09:42:42,423 - max_epochs: 10
2024-07-29 09:42:42,423 - learning_rate: 0.001
2024-07-29 09:42:42,423 - batch_size: 128
2024-07-29 09:42:42,423 - patience: 5
2024-07-29 09:42:42,423 - scheduler_patience: 3
2024-07-29 09:42:42,423 - teacher_forcing_ratio: 0.5
2024-07-29 09:42:42,423 ----------------------------------------------------------------------------------------------------
2024-07-29 09:42:42,423 Computational Parameters:
2024-07-29 09:42:42,423 - num_workers: 4
2024-07-29 09:42:42,423 - device: device(type='cuda', index=0)
2024-07-29 09:42:42,423 ----------------------------------------------------------------------------------------------------
2024-07-29 09:42:42,423 Dataset Splits:
2024-07-29 09:42:42,423 - train: 133623 data points
2024-07-29 09:42:42,423 - dev: 19090 data points
2024-07-29 09:42:42,423 - test: 38179 data points
2024-07-29 09:42:42,424 ----------------------------------------------------------------------------------------------------
2024-07-29 09:42:42,424 EPOCH 1
2024-07-29 09:43:17,154 batch 104/1044 - loss 6.42034669 - lr 0.0010 - time 34.73s
2024-07-29 09:43:54,707 batch 208/1044 - loss 6.17063731 - lr 0.0010 - time 72.28s
2024-07-29 09:44:32,541 batch 312/1044 - loss 6.00517355 - lr 0.0010 - time 110.12s
2024-07-29 09:45:10,795 batch 416/1044 - loss 5.87077612 - lr 0.0010 - time 148.37s
2024-07-29 09:45:47,671 batch 520/1044 - loss 5.75463560 - lr 0.0010 - time 185.25s
2024-07-29 09:46:24,949 batch 624/1044 - loss 5.65824632 - lr 0.0010 - time 222.53s
2024-07-29 09:47:03,552 batch 728/1044 - loss 5.56939856 - lr 0.0010 - time 261.13s
2024-07-29 09:47:40,687 batch 832/1044 - loss 5.49128213 - lr 0.0010 - time 298.26s
2024-07-29 09:48:18,127 batch 936/1044 - loss 5.41966415 - lr 0.0010 - time 335.70s
2024-07-29 09:48:55,119 batch 1040/1044 - loss 5.35489992 - lr 0.0010 - time 372.70s
2024-07-29 09:48:56,680 ----------------------------------------------------------------------------------------------------
2024-07-29 09:48:56,681 EPOCH 1 DONE
2024-07-29 09:49:06,113 TRAIN Loss: 5.3525
2024-07-29 09:49:06,114 DEV Loss: 5.5692
2024-07-29 09:49:06,114 DEV Perplexity: 262.2315
2024-07-29 09:49:06,114 New best score!
2024-07-29 09:49:06,115 ----------------------------------------------------------------------------------------------------
2024-07-29 09:49:06,115 EPOCH 2
2024-07-29 09:49:41,222 batch 104/1044 - loss 4.62738995 - lr 0.0010 - time 35.11s
2024-07-29 09:50:17,864 batch 208/1044 - loss 4.59759969 - lr 0.0010 - time 71.75s
2024-07-29 09:50:53,411 batch 312/1044 - loss 4.57657494 - lr 0.0010 - time 107.30s
2024-07-29 09:51:31,209 batch 416/1044 - loss 4.54348163 - lr 0.0010 - time 145.09s
2024-07-29 09:52:11,697 batch 520/1044 - loss 4.51823422 - lr 0.0010 - time 185.58s
2024-07-29 09:52:48,926 batch 624/1044 - loss 4.49001330 - lr 0.0010 - time 222.81s
2024-07-29 09:53:24,588 batch 728/1044 - loss 4.46876206 - lr 0.0010 - time 258.47s
2024-07-29 09:54:02,468 batch 832/1044 - loss 4.44477118 - lr 0.0010 - time 296.35s
2024-07-29 09:54:39,911 batch 936/1044 - loss 4.42371725 - lr 0.0010 - time 333.80s
2024-07-29 09:55:16,492 batch 1040/1044 - loss 4.40068238 - lr 0.0010 - time 370.38s
2024-07-29 09:55:18,277 ----------------------------------------------------------------------------------------------------
2024-07-29 09:55:18,279 EPOCH 2 DONE
2024-07-29 09:55:27,546 TRAIN Loss: 4.3997
2024-07-29 09:55:27,546 DEV Loss: 5.2857
2024-07-29 09:55:27,546 DEV Perplexity: 197.4908
2024-07-29 09:55:27,546 New best score!
2024-07-29 09:55:27,547 ----------------------------------------------------------------------------------------------------
2024-07-29 09:55:27,547 EPOCH 3
2024-07-29 09:56:04,874 batch 104/1044 - loss 4.04292682 - lr 0.0010 - time 37.33s
2024-07-29 09:56:44,240 batch 208/1044 - loss 4.04458403 - lr 0.0010 - time 76.69s
2024-07-29 09:57:19,595 batch 312/1044 - loss 4.04015087 - lr 0.0010 - time 112.05s
2024-07-29 09:57:58,341 batch 416/1044 - loss 4.03473626 - lr 0.0010 - time 150.79s
2024-07-29 09:58:33,685 batch 520/1044 - loss 4.02294693 - lr 0.0010 - time 186.14s
2024-07-29 09:59:09,374 batch 624/1044 - loss 4.00945110 - lr 0.0010 - time 221.83s
2024-07-29 09:59:49,125 batch 728/1044 - loss 4.00042684 - lr 0.0010 - time 261.58s
2024-07-29 10:00:26,299 batch 832/1044 - loss 3.99049270 - lr 0.0010 - time 298.75s
2024-07-29 10:01:03,713 batch 936/1044 - loss 3.97934972 - lr 0.0010 - time 336.17s
2024-07-29 10:01:40,625 batch 1040/1044 - loss 3.96891846 - lr 0.0010 - time 373.08s
2024-07-29 10:01:41,787 ----------------------------------------------------------------------------------------------------
2024-07-29 10:01:41,789 EPOCH 3 DONE
2024-07-29 10:01:51,163 TRAIN Loss: 3.9687
2024-07-29 10:01:51,163 DEV Loss: 5.2440
2024-07-29 10:01:51,163 DEV Perplexity: 189.4295
2024-07-29 10:01:51,163 New best score!
2024-07-29 10:01:51,164 ----------------------------------------------------------------------------------------------------
2024-07-29 10:01:51,164 EPOCH 4
2024-07-29 10:02:31,057 batch 104/1044 - loss 3.74893653 - lr 0.0010 - time 39.89s
2024-07-29 10:03:05,331 batch 208/1044 - loss 3.75399486 - lr 0.0010 - time 74.17s
2024-07-29 10:03:41,466 batch 312/1044 - loss 3.75771751 - lr 0.0010 - time 110.30s
2024-07-29 10:04:15,960 batch 416/1044 - loss 3.75979321 - lr 0.0010 - time 144.80s
2024-07-29 10:04:55,428 batch 520/1044 - loss 3.75057765 - lr 0.0010 - time 184.26s
2024-07-29 10:05:33,137 batch 624/1044 - loss 3.74305481 - lr 0.0010 - time 221.97s
2024-07-29 10:06:09,059 batch 728/1044 - loss 3.73923583 - lr 0.0010 - time 257.89s
2024-07-29 10:06:47,012 batch 832/1044 - loss 3.73675085 - lr 0.0010 - time 295.85s
2024-07-29 10:07:23,641 batch 936/1044 - loss 3.73419790 - lr 0.0010 - time 332.48s
2024-07-29 10:07:58,748 batch 1040/1044 - loss 3.72953442 - lr 0.0010 - time 367.58s
2024-07-29 10:08:00,245 ----------------------------------------------------------------------------------------------------
2024-07-29 10:08:00,246 EPOCH 4 DONE
2024-07-29 10:08:09,716 TRAIN Loss: 3.7292
2024-07-29 10:08:09,717 DEV Loss: 5.1546
2024-07-29 10:08:09,717 DEV Perplexity: 173.2260
2024-07-29 10:08:09,717 New best score!
2024-07-29 10:08:09,718 ----------------------------------------------------------------------------------------------------
2024-07-29 10:08:09,718 EPOCH 5
2024-07-29 10:08:48,898 batch 104/1044 - loss 3.53810529 - lr 0.0010 - time 39.18s
2024-07-29 10:09:24,261 batch 208/1044 - loss 3.54713277 - lr 0.0010 - time 74.54s
2024-07-29 10:09:59,554 batch 312/1044 - loss 3.55520624 - lr 0.0010 - time 109.84s
2024-07-29 10:10:35,964 batch 416/1044 - loss 3.54529557 - lr 0.0010 - time 146.25s
2024-07-29 10:11:13,273 batch 520/1044 - loss 3.53952308 - lr 0.0010 - time 183.56s
2024-07-29 10:11:49,699 batch 624/1044 - loss 3.53902453 - lr 0.0010 - time 219.98s
2024-07-29 10:12:26,577 batch 728/1044 - loss 3.54207764 - lr 0.0010 - time 256.86s
2024-07-29 10:13:03,988 batch 832/1044 - loss 3.54191658 - lr 0.0010 - time 294.27s
2024-07-29 10:13:44,152 batch 936/1044 - loss 3.54287420 - lr 0.0010 - time 334.43s
2024-07-29 10:14:19,848 batch 1040/1044 - loss 3.54355186 - lr 0.0010 - time 370.13s
2024-07-29 10:14:21,679 ----------------------------------------------------------------------------------------------------
2024-07-29 10:14:21,680 EPOCH 5 DONE
2024-07-29 10:14:31,157 TRAIN Loss: 3.5436
2024-07-29 10:14:31,157 DEV Loss: 5.1595
2024-07-29 10:14:31,157 DEV Perplexity: 174.0773
2024-07-29 10:14:31,157 No improvement for 1 epoch(s)
2024-07-29 10:14:31,157 ----------------------------------------------------------------------------------------------------
2024-07-29 10:14:31,157 EPOCH 6
2024-07-29 10:15:09,004 batch 104/1044 - loss 3.37988193 - lr 0.0010 - time 37.85s
2024-07-29 10:15:46,449 batch 208/1044 - loss 3.39972965 - lr 0.0010 - time 75.29s
2024-07-29 10:16:23,877 batch 312/1044 - loss 3.41839841 - lr 0.0010 - time 112.72s
2024-07-29 10:17:02,860 batch 416/1044 - loss 3.42049147 - lr 0.0010 - time 151.70s
2024-07-29 10:17:39,715 batch 520/1044 - loss 3.42189572 - lr 0.0010 - time 188.56s
2024-07-29 10:18:16,287 batch 624/1044 - loss 3.41934290 - lr 0.0010 - time 225.13s
2024-07-29 10:18:49,350 batch 728/1044 - loss 3.42369204 - lr 0.0010 - time 258.19s
2024-07-29 10:19:27,406 batch 832/1044 - loss 3.42245102 - lr 0.0010 - time 296.25s
2024-07-29 10:20:04,324 batch 936/1044 - loss 3.42058108 - lr 0.0010 - time 333.17s
2024-07-29 10:20:39,261 batch 1040/1044 - loss 3.42255051 - lr 0.0010 - time 368.10s
2024-07-29 10:20:43,715 ----------------------------------------------------------------------------------------------------
2024-07-29 10:20:43,717 EPOCH 6 DONE
2024-07-29 10:20:53,217 TRAIN Loss: 3.4223
2024-07-29 10:20:53,218 DEV Loss: 5.1826
2024-07-29 10:20:53,218 DEV Perplexity: 178.1495
2024-07-29 10:20:53,218 No improvement for 2 epoch(s)
2024-07-29 10:20:53,218 ----------------------------------------------------------------------------------------------------
2024-07-29 10:20:53,218 EPOCH 7
2024-07-29 10:21:31,444 batch 104/1044 - loss 3.29632874 - lr 0.0010 - time 38.23s
2024-07-29 10:22:10,060 batch 208/1044 - loss 3.29179441 - lr 0.0010 - time 76.84s
2024-07-29 10:22:45,065 batch 312/1044 - loss 3.28852440 - lr 0.0010 - time 111.85s
2024-07-29 10:23:21,129 batch 416/1044 - loss 3.29654682 - lr 0.0010 - time 147.91s
2024-07-29 10:23:58,897 batch 520/1044 - loss 3.30062932 - lr 0.0010 - time 185.68s
2024-07-29 10:24:37,910 batch 624/1044 - loss 3.31254658 - lr 0.0010 - time 224.69s
2024-07-29 10:25:15,978 batch 728/1044 - loss 3.31376025 - lr 0.0010 - time 262.76s
2024-07-29 10:25:53,003 batch 832/1044 - loss 3.31953892 - lr 0.0010 - time 299.79s
2024-07-29 10:26:30,024 batch 936/1044 - loss 3.32268426 - lr 0.0010 - time 336.81s
2024-07-29 10:27:05,685 batch 1040/1044 - loss 3.32460238 - lr 0.0010 - time 372.47s
2024-07-29 10:27:06,955 ----------------------------------------------------------------------------------------------------
2024-07-29 10:27:06,957 EPOCH 7 DONE
2024-07-29 10:27:16,539 TRAIN Loss: 3.3246
2024-07-29 10:27:16,539 DEV Loss: 5.2310
2024-07-29 10:27:16,539 DEV Perplexity: 186.9724
2024-07-29 10:27:16,539 No improvement for 3 epoch(s)
2024-07-29 10:27:16,539 ----------------------------------------------------------------------------------------------------
2024-07-29 10:27:16,539 EPOCH 8
2024-07-29 10:27:55,681 batch 104/1044 - loss 3.18067933 - lr 0.0010 - time 39.14s
2024-07-29 10:28:30,973 batch 208/1044 - loss 3.20228673 - lr 0.0010 - time 74.43s
2024-07-29 10:29:06,064 batch 312/1044 - loss 3.20549937 - lr 0.0010 - time 109.53s
2024-07-29 10:29:43,870 batch 416/1044 - loss 3.21897588 - lr 0.0010 - time 147.33s
2024-07-29 10:30:19,159 batch 520/1044 - loss 3.22153870 - lr 0.0010 - time 182.62s
2024-07-29 10:30:55,565 batch 624/1044 - loss 3.22599725 - lr 0.0010 - time 219.03s
2024-07-29 10:31:33,714 batch 728/1044 - loss 3.22878759 - lr 0.0010 - time 257.18s
2024-07-29 10:32:10,440 batch 832/1044 - loss 3.23212968 - lr 0.0010 - time 293.90s
2024-07-29 10:32:48,422 batch 936/1044 - loss 3.23624962 - lr 0.0010 - time 331.88s
2024-07-29 10:33:24,964 batch 1040/1044 - loss 3.23659680 - lr 0.0010 - time 368.42s
2024-07-29 10:33:26,214 ----------------------------------------------------------------------------------------------------
2024-07-29 10:33:26,216 EPOCH 8 DONE
2024-07-29 10:33:35,755 TRAIN Loss: 3.2367
2024-07-29 10:33:35,756 DEV Loss: 5.2968
2024-07-29 10:33:35,756 DEV Perplexity: 199.6878
2024-07-29 10:33:35,756 No improvement for 4 epoch(s)
2024-07-29 10:33:35,756 ----------------------------------------------------------------------------------------------------
2024-07-29 10:33:35,756 EPOCH 9
2024-07-29 10:34:15,083 batch 104/1044 - loss 3.08033091 - lr 0.0001 - time 39.33s
2024-07-29 10:34:52,691 batch 208/1044 - loss 3.07522689 - lr 0.0001 - time 76.93s
2024-07-29 10:35:29,151 batch 312/1044 - loss 3.06626054 - lr 0.0001 - time 113.39s
2024-07-29 10:36:06,720 batch 416/1044 - loss 3.06839789 - lr 0.0001 - time 150.96s
2024-07-29 10:36:41,167 batch 520/1044 - loss 3.06539460 - lr 0.0001 - time 185.41s
2024-07-29 10:37:17,074 batch 624/1044 - loss 3.06574041 - lr 0.0001 - time 221.32s
2024-07-29 10:37:54,392 batch 728/1044 - loss 3.06843089 - lr 0.0001 - time 258.64s
2024-07-29 10:38:31,689 batch 832/1044 - loss 3.06777010 - lr 0.0001 - time 295.93s
2024-07-29 10:39:06,956 batch 936/1044 - loss 3.06646013 - lr 0.0001 - time 331.20s
2024-07-29 10:39:45,993 batch 1040/1044 - loss 3.06478271 - lr 0.0001 - time 370.24s
2024-07-29 10:39:47,096 ----------------------------------------------------------------------------------------------------
2024-07-29 10:39:47,098 EPOCH 9 DONE
2024-07-29 10:39:56,496 TRAIN Loss: 3.0646
2024-07-29 10:39:56,497 DEV Loss: 5.1945
2024-07-29 10:39:56,497 DEV Perplexity: 180.2739
2024-07-29 10:39:56,497 No improvement for 5 epoch(s)
2024-07-29 10:39:56,497 Patience reached: Terminating model training due to early stopping
2024-07-29 10:39:56,497 ----------------------------------------------------------------------------------------------------
2024-07-29 10:39:56,497 Finished Training
2024-07-29 10:40:14,449 TEST Perplexity: 173.0781
2024-07-29 10:49:34,588 TEST BLEU = 17.27 82.9/65.2/22.1/0.7 (BP = 1.000 ratio = 1.000 hyp_len = 70 ref_len = 70)