dobbersc commited on
Commit
c875faf
·
verified ·
1 Parent(s): 20bd4ad

Add greek models

Browse files
models/en2el/character_end2end_embeddings_with_attention/log.txt ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-07-30 02:13:30,544 ----------------------------------------------------------------------------------------------------
2
+ 2024-07-30 02:13:30,544 Training Model
3
+ 2024-07-30 02:13:30,544 ----------------------------------------------------------------------------------------------------
4
+ 2024-07-30 02:13:30,544 Translator(
5
+ (encoder): EncoderLSTM(
6
+ (embedding): Embedding(107, 300, padding_idx=0)
7
+ (dropout): Dropout(p=0.1, inplace=False)
8
+ (lstm): LSTM(300, 512, batch_first=True)
9
+ )
10
+ (decoder): DecoderLSTM(
11
+ (embedding): Embedding(128, 300, padding_idx=0)
12
+ (dropout): Dropout(p=0.1, inplace=False)
13
+ (lstm): LSTM(300, 512, batch_first=True)
14
+ (attention): DotProductAttention(
15
+ (softmax): Softmax(dim=-1)
16
+ (combined2hidden): Sequential(
17
+ (0): Linear(in_features=1024, out_features=512, bias=True)
18
+ (1): ReLU()
19
+ )
20
+ )
21
+ (hidden2vocab): Linear(in_features=512, out_features=128, bias=True)
22
+ (log_softmax): LogSoftmax(dim=-1)
23
+ )
24
+ )
25
+ 2024-07-30 02:13:30,545 ----------------------------------------------------------------------------------------------------
26
+ 2024-07-30 02:13:30,545 Training Hyperparameters:
27
+ 2024-07-30 02:13:30,545 - max_epochs: 10
28
+ 2024-07-30 02:13:30,545 - learning_rate: 0.001
29
+ 2024-07-30 02:13:30,545 - batch_size: 128
30
+ 2024-07-30 02:13:30,545 - patience: 5
31
+ 2024-07-30 02:13:30,545 - scheduler_patience: 3
32
+ 2024-07-30 02:13:30,545 - teacher_forcing_ratio: 0.5
33
+ 2024-07-30 02:13:30,545 ----------------------------------------------------------------------------------------------------
34
+ 2024-07-30 02:13:30,545 Computational Parameters:
35
+ 2024-07-30 02:13:30,545 - num_workers: 4
36
+ 2024-07-30 02:13:30,545 - device: device(type='cuda', index=0)
37
+ 2024-07-30 02:13:30,545 ----------------------------------------------------------------------------------------------------
38
+ 2024-07-30 02:13:30,545 Dataset Splits:
39
+ 2024-07-30 02:13:30,545 - train: 85949 data points
40
+ 2024-07-30 02:13:30,545 - dev: 12279 data points
41
+ 2024-07-30 02:13:30,545 - test: 24557 data points
42
+ 2024-07-30 02:13:30,545 ----------------------------------------------------------------------------------------------------
43
+ 2024-07-30 02:13:30,545 EPOCH 1
44
+ 2024-07-30 02:15:42,182 batch 67/672 - loss 3.19545212 - lr 0.0010 - time 131.64s
45
+ 2024-07-30 02:17:40,099 batch 134/672 - loss 3.02495554 - lr 0.0010 - time 249.55s
46
+ 2024-07-30 02:19:39,521 batch 201/672 - loss 2.92257840 - lr 0.0010 - time 368.98s
47
+ 2024-07-30 02:22:13,372 batch 268/672 - loss 2.85199871 - lr 0.0010 - time 522.83s
48
+ 2024-07-30 02:24:16,980 batch 335/672 - loss 2.79420793 - lr 0.0010 - time 646.43s
49
+ 2024-07-30 02:26:25,476 batch 402/672 - loss 2.74788210 - lr 0.0010 - time 774.93s
50
+ 2024-07-30 02:28:37,609 batch 469/672 - loss 2.70773431 - lr 0.0010 - time 907.06s
51
+ 2024-07-30 02:30:34,985 batch 536/672 - loss 2.67195398 - lr 0.0010 - time 1024.44s
52
+ 2024-07-30 02:32:46,538 batch 603/672 - loss 2.64084060 - lr 0.0010 - time 1155.99s
53
+ 2024-07-30 02:34:55,748 batch 670/672 - loss 2.61186988 - lr 0.0010 - time 1285.20s
54
+ 2024-07-30 02:34:58,934 ----------------------------------------------------------------------------------------------------
55
+ 2024-07-30 02:34:58,937 EPOCH 1 DONE
56
+ 2024-07-30 02:35:33,168 TRAIN Loss: 2.6108
57
+ 2024-07-30 02:35:33,168 DEV Loss: 4.0377
58
+ 2024-07-30 02:35:33,168 DEV Perplexity: 56.6981
59
+ 2024-07-30 02:35:33,168 New best score!
60
+ 2024-07-30 02:35:33,170 ----------------------------------------------------------------------------------------------------
61
+ 2024-07-30 02:35:33,170 EPOCH 2
62
+ 2024-07-30 02:37:40,712 batch 67/672 - loss 2.32208106 - lr 0.0010 - time 127.54s
63
+ 2024-07-30 02:39:39,545 batch 134/672 - loss 2.30324291 - lr 0.0010 - time 246.38s
64
+ 2024-07-30 02:41:50,177 batch 201/672 - loss 2.29119577 - lr 0.0010 - time 377.01s
65
+ 2024-07-30 02:44:22,124 batch 268/672 - loss 2.27651633 - lr 0.0010 - time 528.95s
66
+ 2024-07-30 02:46:29,564 batch 335/672 - loss 2.26064277 - lr 0.0010 - time 656.39s
67
+ 2024-07-30 02:48:27,268 batch 402/672 - loss 2.24953536 - lr 0.0010 - time 774.10s
68
+ 2024-07-30 02:50:23,982 batch 469/672 - loss 2.23849808 - lr 0.0010 - time 890.81s
69
+ 2024-07-30 02:52:40,137 batch 536/672 - loss 2.22690770 - lr 0.0010 - time 1026.97s
70
+ 2024-07-30 02:54:52,910 batch 603/672 - loss 2.21315394 - lr 0.0010 - time 1159.74s
71
+ 2024-07-30 02:57:00,732 batch 670/672 - loss 2.19986962 - lr 0.0010 - time 1287.56s
72
+ 2024-07-30 02:57:03,825 ----------------------------------------------------------------------------------------------------
73
+ 2024-07-30 02:57:03,828 EPOCH 2 DONE
74
+ 2024-07-30 02:57:38,031 TRAIN Loss: 2.1993
75
+ 2024-07-30 02:57:38,032 DEV Loss: 4.1666
76
+ 2024-07-30 02:57:38,033 DEV Perplexity: 64.4964
77
+ 2024-07-30 02:57:38,033 No improvement for 1 epoch(s)
78
+ 2024-07-30 02:57:38,033 ----------------------------------------------------------------------------------------------------
79
+ 2024-07-30 02:57:38,033 EPOCH 3
80
+ 2024-07-30 02:59:46,882 batch 67/672 - loss 2.07768523 - lr 0.0010 - time 128.85s
81
+ 2024-07-30 03:02:13,067 batch 134/672 - loss 2.06771447 - lr 0.0010 - time 275.03s
82
+ 2024-07-30 03:04:13,352 batch 201/672 - loss 2.05206243 - lr 0.0010 - time 395.32s
83
+ 2024-07-30 03:06:15,924 batch 268/672 - loss 2.03767699 - lr 0.0010 - time 517.89s
84
+ 2024-07-30 03:08:29,454 batch 335/672 - loss 2.02756568 - lr 0.0010 - time 651.42s
85
+ 2024-07-30 03:10:36,938 batch 402/672 - loss 2.01690815 - lr 0.0010 - time 778.90s
86
+ 2024-07-30 03:12:44,576 batch 469/672 - loss 2.00959916 - lr 0.0010 - time 906.54s
87
+ 2024-07-30 03:14:42,904 batch 536/672 - loss 1.99967818 - lr 0.0010 - time 1024.87s
88
+ 2024-07-30 03:17:05,177 batch 603/672 - loss 1.99148476 - lr 0.0010 - time 1167.14s
89
+ 2024-07-30 03:19:10,961 batch 670/672 - loss 1.98160288 - lr 0.0010 - time 1292.93s
90
+ 2024-07-30 03:19:13,988 ----------------------------------------------------------------------------------------------------
91
+ 2024-07-30 03:19:13,990 EPOCH 3 DONE
92
+ 2024-07-30 03:19:48,048 TRAIN Loss: 1.9813
93
+ 2024-07-30 03:19:48,050 DEV Loss: 4.2504
94
+ 2024-07-30 03:19:48,050 DEV Perplexity: 70.1329
95
+ 2024-07-30 03:19:48,050 No improvement for 2 epoch(s)
96
+ 2024-07-30 03:19:48,050 ----------------------------------------------------------------------------------------------------
97
+ 2024-07-30 03:19:48,050 EPOCH 4
98
+ 2024-07-30 03:21:58,453 batch 67/672 - loss 1.87205641 - lr 0.0010 - time 130.40s
99
+ 2024-07-30 03:23:54,350 batch 134/672 - loss 1.87312458 - lr 0.0010 - time 246.30s
100
+ 2024-07-30 03:26:26,003 batch 201/672 - loss 1.86491152 - lr 0.0010 - time 397.95s
101
+ 2024-07-30 03:28:31,716 batch 268/672 - loss 1.85794664 - lr 0.0010 - time 523.67s
102
+ 2024-07-30 03:30:58,523 batch 335/672 - loss 1.85268306 - lr 0.0010 - time 670.47s
103
+ 2024-07-30 03:32:55,289 batch 402/672 - loss 1.84701065 - lr 0.0010 - time 787.24s
104
+ 2024-07-30 03:35:17,440 batch 469/672 - loss 1.83774444 - lr 0.0010 - time 929.39s
105
+ 2024-07-30 03:37:17,765 batch 536/672 - loss 1.83106400 - lr 0.0010 - time 1049.71s
106
+ 2024-07-30 03:39:24,224 batch 603/672 - loss 1.82428703 - lr 0.0010 - time 1176.17s
107
+ 2024-07-30 03:41:19,788 batch 670/672 - loss 1.81979131 - lr 0.0010 - time 1291.74s
108
+ 2024-07-30 03:41:22,695 ----------------------------------------------------------------------------------------------------
109
+ 2024-07-30 03:41:22,699 EPOCH 4 DONE
110
+ 2024-07-30 03:41:56,808 TRAIN Loss: 1.8197
111
+ 2024-07-30 03:41:56,809 DEV Loss: 4.5206
112
+ 2024-07-30 03:41:56,809 DEV Perplexity: 91.8923
113
+ 2024-07-30 03:41:56,809 No improvement for 3 epoch(s)
114
+ 2024-07-30 03:41:56,809 ----------------------------------------------------------------------------------------------------
115
+ 2024-07-30 03:41:56,809 EPOCH 5
116
+ 2024-07-30 03:44:08,647 batch 67/672 - loss 1.75557149 - lr 0.0010 - time 131.84s
117
+ 2024-07-30 03:46:06,957 batch 134/672 - loss 1.74974602 - lr 0.0010 - time 250.15s
118
+ 2024-07-30 03:48:35,604 batch 201/672 - loss 1.74676394 - lr 0.0010 - time 398.79s
119
+ 2024-07-30 03:50:38,242 batch 268/672 - loss 1.74127575 - lr 0.0010 - time 521.43s
120
+ 2024-07-30 03:53:07,602 batch 335/672 - loss 1.73835572 - lr 0.0010 - time 670.79s
121
+ 2024-07-30 03:55:13,545 batch 402/672 - loss 1.73563331 - lr 0.0010 - time 796.74s
122
+ 2024-07-30 03:57:11,578 batch 469/672 - loss 1.73164746 - lr 0.0010 - time 914.77s
123
+ 2024-07-30 03:59:22,636 batch 536/672 - loss 1.72733042 - lr 0.0010 - time 1045.83s
124
+ 2024-07-30 04:01:27,047 batch 603/672 - loss 1.72134435 - lr 0.0010 - time 1170.24s
125
+ 2024-07-30 04:03:30,004 batch 670/672 - loss 1.71633585 - lr 0.0010 - time 1293.20s
126
+ 2024-07-30 04:03:33,232 ----------------------------------------------------------------------------------------------------
127
+ 2024-07-30 04:03:33,234 EPOCH 5 DONE
128
+ 2024-07-30 04:04:07,476 TRAIN Loss: 1.7158
129
+ 2024-07-30 04:04:07,478 DEV Loss: 4.7345
130
+ 2024-07-30 04:04:07,478 DEV Perplexity: 113.8115
131
+ 2024-07-30 04:04:07,478 No improvement for 4 epoch(s)
132
+ 2024-07-30 04:04:07,478 ----------------------------------------------------------------------------------------------------
133
+ 2024-07-30 04:04:07,478 EPOCH 6
134
+ 2024-07-30 04:06:07,000 batch 67/672 - loss 1.62654531 - lr 0.0001 - time 119.52s
135
+ 2024-07-30 04:08:25,932 batch 134/672 - loss 1.62444705 - lr 0.0001 - time 258.45s
136
+ 2024-07-30 04:10:20,762 batch 201/672 - loss 1.62080814 - lr 0.0001 - time 373.28s
137
+ 2024-07-30 04:12:32,261 batch 268/672 - loss 1.62108705 - lr 0.0001 - time 504.78s
138
+ 2024-07-30 04:14:45,293 batch 335/672 - loss 1.61820102 - lr 0.0001 - time 637.81s
139
+ 2024-07-30 04:16:51,180 batch 402/672 - loss 1.61746165 - lr 0.0001 - time 763.70s
140
+ 2024-07-30 04:18:59,569 batch 469/672 - loss 1.61459681 - lr 0.0001 - time 892.09s
141
+ 2024-07-30 04:21:23,775 batch 536/672 - loss 1.61302190 - lr 0.0001 - time 1036.30s
142
+ 2024-07-30 04:23:36,945 batch 603/672 - loss 1.60916015 - lr 0.0001 - time 1169.47s
143
+ 2024-07-30 04:25:42,320 batch 670/672 - loss 1.60702325 - lr 0.0001 - time 1294.84s
144
+ 2024-07-30 04:25:44,877 ----------------------------------------------------------------------------------------------------
145
+ 2024-07-30 04:25:44,881 EPOCH 6 DONE
146
+ 2024-07-30 04:26:19,147 TRAIN Loss: 1.6068
147
+ 2024-07-30 04:26:19,148 DEV Loss: 4.7702
148
+ 2024-07-30 04:26:19,148 DEV Perplexity: 117.9441
149
+ 2024-07-30 04:26:19,148 No improvement for 5 epoch(s)
150
+ 2024-07-30 04:26:19,148 Patience reached: Terminating model training due to early stopping
151
+ 2024-07-30 04:26:19,148 ----------------------------------------------------------------------------------------------------
152
+ 2024-07-30 04:26:19,148 Finished Training
153
+ 2024-07-30 04:27:25,619 TEST Perplexity: 56.6321
154
+ 2024-07-30 04:34:41,478 TEST BLEU = 3.34 40.7/3.8/1.3/0.6 (BP = 1.000 ratio = 1.000 hyp_len = 81 ref_len = 81)
models/en2el/character_end2end_embeddings_with_attention/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a30b7c13f9fb67747ce5860d92cd44f640e63d710487102d6c837c07b4e5c63a
3
+ size 15989544
models/en2el/character_end2end_embeddings_without_attention/log.txt ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-07-30 04:34:50,329 ----------------------------------------------------------------------------------------------------
2
+ 2024-07-30 04:34:50,329 Training Model
3
+ 2024-07-30 04:34:50,329 ----------------------------------------------------------------------------------------------------
4
+ 2024-07-30 04:34:50,329 Translator(
5
+ (encoder): EncoderLSTM(
6
+ (embedding): Embedding(107, 300, padding_idx=0)
7
+ (dropout): Dropout(p=0.1, inplace=False)
8
+ (lstm): LSTM(300, 512, batch_first=True, bidirectional=True)
9
+ )
10
+ (decoder): DecoderLSTM(
11
+ (embedding): Embedding(128, 300, padding_idx=0)
12
+ (dropout): Dropout(p=0.1, inplace=False)
13
+ (lstm): LSTM(300, 1024, batch_first=True)
14
+ (hidden2vocab): Linear(in_features=1024, out_features=128, bias=True)
15
+ (log_softmax): LogSoftmax(dim=-1)
16
+ )
17
+ )
18
+ 2024-07-30 04:34:50,329 ----------------------------------------------------------------------------------------------------
19
+ 2024-07-30 04:34:50,329 Training Hyperparameters:
20
+ 2024-07-30 04:34:50,329 - max_epochs: 10
21
+ 2024-07-30 04:34:50,329 - learning_rate: 0.001
22
+ 2024-07-30 04:34:50,329 - batch_size: 128
23
+ 2024-07-30 04:34:50,329 - patience: 5
24
+ 2024-07-30 04:34:50,329 - scheduler_patience: 3
25
+ 2024-07-30 04:34:50,329 - teacher_forcing_ratio: 0.5
26
+ 2024-07-30 04:34:50,329 ----------------------------------------------------------------------------------------------------
27
+ 2024-07-30 04:34:50,329 Computational Parameters:
28
+ 2024-07-30 04:34:50,329 - num_workers: 4
29
+ 2024-07-30 04:34:50,329 - device: device(type='cuda', index=0)
30
+ 2024-07-30 04:34:50,329 ----------------------------------------------------------------------------------------------------
31
+ 2024-07-30 04:34:50,329 Dataset Splits:
32
+ 2024-07-30 04:34:50,329 - train: 85949 data points
33
+ 2024-07-30 04:34:50,329 - dev: 12279 data points
34
+ 2024-07-30 04:34:50,329 - test: 24557 data points
35
+ 2024-07-30 04:34:50,329 ----------------------------------------------------------------------------------------------------
36
+ 2024-07-30 04:34:50,329 EPOCH 1
37
+ 2024-07-30 04:35:23,933 batch 67/672 - loss 3.10038218 - lr 0.0010 - time 33.60s
38
+ 2024-07-30 04:35:57,827 batch 134/672 - loss 2.93854638 - lr 0.0010 - time 67.50s
39
+ 2024-07-30 04:36:34,629 batch 201/672 - loss 2.84284473 - lr 0.0010 - time 104.30s
40
+ 2024-07-30 04:37:08,206 batch 268/672 - loss 2.77865532 - lr 0.0010 - time 137.88s
41
+ 2024-07-30 04:37:40,691 batch 335/672 - loss 2.72912977 - lr 0.0010 - time 170.36s
42
+ 2024-07-30 04:38:15,937 batch 402/672 - loss 2.68943459 - lr 0.0010 - time 205.61s
43
+ 2024-07-30 04:38:49,471 batch 469/672 - loss 2.65444017 - lr 0.0010 - time 239.14s
44
+ 2024-07-30 04:39:23,887 batch 536/672 - loss 2.62400033 - lr 0.0010 - time 273.56s
45
+ 2024-07-30 04:39:59,724 batch 603/672 - loss 2.59848579 - lr 0.0010 - time 309.39s
46
+ 2024-07-30 04:40:33,398 batch 670/672 - loss 2.57486767 - lr 0.0010 - time 343.07s
47
+ 2024-07-30 04:40:34,493 ----------------------------------------------------------------------------------------------------
48
+ 2024-07-30 04:40:34,494 EPOCH 1 DONE
49
+ 2024-07-30 04:40:53,264 TRAIN Loss: 2.5740
50
+ 2024-07-30 04:40:53,265 DEV Loss: 3.9082
51
+ 2024-07-30 04:40:53,265 DEV Perplexity: 49.8085
52
+ 2024-07-30 04:40:53,265 New best score!
53
+ 2024-07-30 04:40:53,266 ----------------------------------------------------------------------------------------------------
54
+ 2024-07-30 04:40:53,266 EPOCH 2
55
+ 2024-07-30 04:41:27,020 batch 67/672 - loss 2.34747491 - lr 0.0010 - time 33.75s
56
+ 2024-07-30 04:42:01,349 batch 134/672 - loss 2.33755493 - lr 0.0010 - time 68.08s
57
+ 2024-07-30 04:42:36,651 batch 201/672 - loss 2.33333628 - lr 0.0010 - time 103.38s
58
+ 2024-07-30 04:43:11,513 batch 268/672 - loss 2.32438606 - lr 0.0010 - time 138.25s
59
+ 2024-07-30 04:43:45,864 batch 335/672 - loss 2.31273509 - lr 0.0010 - time 172.60s
60
+ 2024-07-30 04:44:20,686 batch 402/672 - loss 2.30774817 - lr 0.0010 - time 207.42s
61
+ 2024-07-30 04:44:56,655 batch 469/672 - loss 2.30176207 - lr 0.0010 - time 243.39s
62
+ 2024-07-30 04:45:30,196 batch 536/672 - loss 2.29646675 - lr 0.0010 - time 276.93s
63
+ 2024-07-30 04:46:02,683 batch 603/672 - loss 2.28910598 - lr 0.0010 - time 309.42s
64
+ 2024-07-30 04:46:36,611 batch 670/672 - loss 2.28247752 - lr 0.0010 - time 343.34s
65
+ 2024-07-30 04:46:37,848 ----------------------------------------------------------------------------------------------------
66
+ 2024-07-30 04:46:37,849 EPOCH 2 DONE
67
+ 2024-07-30 04:46:56,561 TRAIN Loss: 2.2822
68
+ 2024-07-30 04:46:56,563 DEV Loss: 4.0626
69
+ 2024-07-30 04:46:56,563 DEV Perplexity: 58.1254
70
+ 2024-07-30 04:46:56,563 No improvement for 1 epoch(s)
71
+ 2024-07-30 04:46:56,563 ----------------------------------------------------------------------------------------------------
72
+ 2024-07-30 04:46:56,563 EPOCH 3
73
+ 2024-07-30 04:47:30,137 batch 67/672 - loss 2.21325973 - lr 0.0010 - time 33.57s
74
+ 2024-07-30 04:48:04,026 batch 134/672 - loss 2.21132334 - lr 0.0010 - time 67.46s
75
+ 2024-07-30 04:48:38,533 batch 201/672 - loss 2.20828129 - lr 0.0010 - time 101.97s
76
+ 2024-07-30 04:49:16,008 batch 268/672 - loss 2.20410383 - lr 0.0010 - time 139.45s
77
+ 2024-07-30 04:49:49,441 batch 335/672 - loss 2.20347932 - lr 0.0010 - time 172.88s
78
+ 2024-07-30 04:50:24,571 batch 402/672 - loss 2.20117456 - lr 0.0010 - time 208.01s
79
+ 2024-07-30 04:50:58,049 batch 469/672 - loss 2.19920015 - lr 0.0010 - time 241.49s
80
+ 2024-07-30 04:51:31,770 batch 536/672 - loss 2.19497188 - lr 0.0010 - time 275.21s
81
+ 2024-07-30 04:52:04,483 batch 603/672 - loss 2.19227091 - lr 0.0010 - time 307.92s
82
+ 2024-07-30 04:52:40,432 batch 670/672 - loss 2.18894476 - lr 0.0010 - time 343.87s
83
+ 2024-07-30 04:52:41,406 ----------------------------------------------------------------------------------------------------
84
+ 2024-07-30 04:52:41,407 EPOCH 3 DONE
85
+ 2024-07-30 04:53:00,057 TRAIN Loss: 2.1886
86
+ 2024-07-30 04:53:00,058 DEV Loss: 4.0608
87
+ 2024-07-30 04:53:00,058 DEV Perplexity: 58.0207
88
+ 2024-07-30 04:53:00,059 No improvement for 2 epoch(s)
89
+ 2024-07-30 04:53:00,059 ----------------------------------------------------------------------------------------------------
90
+ 2024-07-30 04:53:00,059 EPOCH 4
91
+ 2024-07-30 04:53:35,157 batch 67/672 - loss 2.16437154 - lr 0.0010 - time 35.10s
92
+ 2024-07-30 04:54:10,957 batch 134/672 - loss 2.16561750 - lr 0.0010 - time 70.90s
93
+ 2024-07-30 04:54:46,558 batch 201/672 - loss 2.15123330 - lr 0.0010 - time 106.50s
94
+ 2024-07-30 04:55:23,074 batch 268/672 - loss 2.14830943 - lr 0.0010 - time 143.02s
95
+ 2024-07-30 04:55:56,263 batch 335/672 - loss 2.14384489 - lr 0.0010 - time 176.20s
96
+ 2024-07-30 04:56:29,516 batch 402/672 - loss 2.14317492 - lr 0.0010 - time 209.46s
97
+ 2024-07-30 04:57:03,034 batch 469/672 - loss 2.13807481 - lr 0.0010 - time 242.97s
98
+ 2024-07-30 04:57:38,106 batch 536/672 - loss 2.13714204 - lr 0.0010 - time 278.05s
99
+ 2024-07-30 04:58:11,804 batch 603/672 - loss 2.13427860 - lr 0.0010 - time 311.74s
100
+ 2024-07-30 04:58:45,446 batch 670/672 - loss 2.13089425 - lr 0.0010 - time 345.39s
101
+ 2024-07-30 04:58:46,510 ----------------------------------------------------------------------------------------------------
102
+ 2024-07-30 04:58:46,512 EPOCH 4 DONE
103
+ 2024-07-30 04:59:05,228 TRAIN Loss: 2.1310
104
+ 2024-07-30 04:59:05,229 DEV Loss: 3.8841
105
+ 2024-07-30 04:59:05,229 DEV Perplexity: 48.6242
106
+ 2024-07-30 04:59:05,229 New best score!
107
+ 2024-07-30 04:59:05,230 ----------------------------------------------------------------------------------------------------
108
+ 2024-07-30 04:59:05,230 EPOCH 5
109
+ 2024-07-30 04:59:39,865 batch 67/672 - loss 2.09914785 - lr 0.0010 - time 34.63s
110
+ 2024-07-30 05:00:12,979 batch 134/672 - loss 2.09666307 - lr 0.0010 - time 67.75s
111
+ 2024-07-30 05:00:45,938 batch 201/672 - loss 2.09769529 - lr 0.0010 - time 100.71s
112
+ 2024-07-30 05:01:21,022 batch 268/672 - loss 2.09707888 - lr 0.0010 - time 135.79s
113
+ 2024-07-30 05:01:55,606 batch 335/672 - loss 2.09564164 - lr 0.0010 - time 170.38s
114
+ 2024-07-30 05:02:30,685 batch 402/672 - loss 2.09658029 - lr 0.0010 - time 205.45s
115
+ 2024-07-30 05:03:06,180 batch 469/672 - loss 2.09829040 - lr 0.0010 - time 240.95s
116
+ 2024-07-30 05:03:41,411 batch 536/672 - loss 2.09503837 - lr 0.0010 - time 276.18s
117
+ 2024-07-30 05:04:15,328 batch 603/672 - loss 2.09097340 - lr 0.0010 - time 310.10s
118
+ 2024-07-30 05:04:50,806 batch 670/672 - loss 2.09143265 - lr 0.0010 - time 345.58s
119
+ 2024-07-30 05:04:51,742 ----------------------------------------------------------------------------------------------------
120
+ 2024-07-30 05:04:51,744 EPOCH 5 DONE
121
+ 2024-07-30 05:05:10,440 TRAIN Loss: 2.0914
122
+ 2024-07-30 05:05:10,441 DEV Loss: 4.0964
123
+ 2024-07-30 05:05:10,441 DEV Perplexity: 60.1262
124
+ 2024-07-30 05:05:10,441 No improvement for 1 epoch(s)
125
+ 2024-07-30 05:05:10,441 ----------------------------------------------------------------------------------------------------
126
+ 2024-07-30 05:05:10,441 EPOCH 6
127
+ 2024-07-30 05:05:43,928 batch 67/672 - loss 2.07384328 - lr 0.0010 - time 33.49s
128
+ 2024-07-30 05:06:18,330 batch 134/672 - loss 2.06872745 - lr 0.0010 - time 67.89s
129
+ 2024-07-30 05:06:52,416 batch 201/672 - loss 2.06421577 - lr 0.0010 - time 101.97s
130
+ 2024-07-30 05:07:28,878 batch 268/672 - loss 2.05813380 - lr 0.0010 - time 138.44s
131
+ 2024-07-30 05:08:04,454 batch 335/672 - loss 2.05886560 - lr 0.0010 - time 174.01s
132
+ 2024-07-30 05:08:39,090 batch 402/672 - loss 2.06078077 - lr 0.0010 - time 208.65s
133
+ 2024-07-30 05:09:13,874 batch 469/672 - loss 2.05874935 - lr 0.0010 - time 243.43s
134
+ 2024-07-30 05:09:48,701 batch 536/672 - loss 2.07250601 - lr 0.0010 - time 278.26s
135
+ 2024-07-30 05:10:23,311 batch 603/672 - loss 2.07534648 - lr 0.0010 - time 312.87s
136
+ 2024-07-30 05:10:59,023 batch 670/672 - loss 2.07414579 - lr 0.0010 - time 348.58s
137
+ 2024-07-30 05:10:59,870 ----------------------------------------------------------------------------------------------------
138
+ 2024-07-30 05:10:59,871 EPOCH 6 DONE
139
+ 2024-07-30 05:11:18,515 TRAIN Loss: 2.0743
140
+ 2024-07-30 05:11:18,515 DEV Loss: 3.9534
141
+ 2024-07-30 05:11:18,515 DEV Perplexity: 52.1110
142
+ 2024-07-30 05:11:18,515 No improvement for 2 epoch(s)
143
+ 2024-07-30 05:11:18,515 ----------------------------------------------------------------------------------------------------
144
+ 2024-07-30 05:11:18,515 EPOCH 7
145
+ 2024-07-30 05:11:52,412 batch 67/672 - loss 2.03792779 - lr 0.0010 - time 33.90s
146
+ 2024-07-30 05:12:26,407 batch 134/672 - loss 2.05658157 - lr 0.0010 - time 67.89s
147
+ 2024-07-30 05:13:03,550 batch 201/672 - loss 2.05529225 - lr 0.0010 - time 105.03s
148
+ 2024-07-30 05:13:39,381 batch 268/672 - loss 2.05117504 - lr 0.0010 - time 140.87s
149
+ 2024-07-30 05:14:12,825 batch 335/672 - loss 2.05023175 - lr 0.0010 - time 174.31s
150
+ 2024-07-30 05:14:47,599 batch 402/672 - loss 2.04865991 - lr 0.0010 - time 209.08s
151
+ 2024-07-30 05:15:23,442 batch 469/672 - loss 2.04811801 - lr 0.0010 - time 244.93s
152
+ 2024-07-30 05:15:58,345 batch 536/672 - loss 2.04754226 - lr 0.0010 - time 279.83s
153
+ 2024-07-30 05:16:32,285 batch 603/672 - loss 2.04571632 - lr 0.0010 - time 313.77s
154
+ 2024-07-30 05:17:05,555 batch 670/672 - loss 2.04303049 - lr 0.0010 - time 347.04s
155
+ 2024-07-30 05:17:06,402 ----------------------------------------------------------------------------------------------------
156
+ 2024-07-30 05:17:06,403 EPOCH 7 DONE
157
+ 2024-07-30 05:17:24,980 TRAIN Loss: 2.0426
158
+ 2024-07-30 05:17:24,981 DEV Loss: 3.9780
159
+ 2024-07-30 05:17:24,981 DEV Perplexity: 53.4092
160
+ 2024-07-30 05:17:24,981 No improvement for 3 epoch(s)
161
+ 2024-07-30 05:17:24,981 ----------------------------------------------------------------------------------------------------
162
+ 2024-07-30 05:17:24,981 EPOCH 8
163
+ 2024-07-30 05:17:58,705 batch 67/672 - loss 2.01949855 - lr 0.0010 - time 33.72s
164
+ 2024-07-30 05:18:35,336 batch 134/672 - loss 2.01598190 - lr 0.0010 - time 70.35s
165
+ 2024-07-30 05:19:09,132 batch 201/672 - loss 2.01850937 - lr 0.0010 - time 104.15s
166
+ 2024-07-30 05:19:44,524 batch 268/672 - loss 2.01815947 - lr 0.0010 - time 139.54s
167
+ 2024-07-30 05:20:16,018 batch 335/672 - loss 2.01943199 - lr 0.0010 - time 171.04s
168
+ 2024-07-30 05:20:52,150 batch 402/672 - loss 2.01703913 - lr 0.0010 - time 207.17s
169
+ 2024-07-30 05:21:28,374 batch 469/672 - loss 2.01854343 - lr 0.0010 - time 243.39s
170
+ 2024-07-30 05:22:03,634 batch 536/672 - loss 2.01803988 - lr 0.0010 - time 278.65s
171
+ 2024-07-30 05:22:38,061 batch 603/672 - loss 2.01876796 - lr 0.0010 - time 313.08s
172
+ 2024-07-30 05:23:11,962 batch 670/672 - loss 2.02086623 - lr 0.0010 - time 346.98s
173
+ 2024-07-30 05:23:12,878 ----------------------------------------------------------------------------------------------------
174
+ 2024-07-30 05:23:12,879 EPOCH 8 DONE
175
+ 2024-07-30 05:23:31,606 TRAIN Loss: 2.0211
176
+ 2024-07-30 05:23:31,607 DEV Loss: 3.9168
177
+ 2024-07-30 05:23:31,607 DEV Perplexity: 50.2399
178
+ 2024-07-30 05:23:31,607 No improvement for 4 epoch(s)
179
+ 2024-07-30 05:23:31,607 ----------------------------------------------------------------------------------------------------
180
+ 2024-07-30 05:23:31,607 EPOCH 9
181
+ 2024-07-30 05:24:05,806 batch 67/672 - loss 1.99463483 - lr 0.0001 - time 34.20s
182
+ 2024-07-30 05:24:41,641 batch 134/672 - loss 1.98969843 - lr 0.0001 - time 70.03s
183
+ 2024-07-30 05:25:14,737 batch 201/672 - loss 1.98845569 - lr 0.0001 - time 103.13s
184
+ 2024-07-30 05:25:49,179 batch 268/672 - loss 1.98329550 - lr 0.0001 - time 137.57s
185
+ 2024-07-30 05:26:23,329 batch 335/672 - loss 1.98188789 - lr 0.0001 - time 171.72s
186
+ 2024-07-30 05:27:00,271 batch 402/672 - loss 1.98281432 - lr 0.0001 - time 208.66s
187
+ 2024-07-30 05:27:34,774 batch 469/672 - loss 1.98228704 - lr 0.0001 - time 243.17s
188
+ 2024-07-30 05:28:09,111 batch 536/672 - loss 1.98135818 - lr 0.0001 - time 277.50s
189
+ 2024-07-30 05:28:42,468 batch 603/672 - loss 1.98060542 - lr 0.0001 - time 310.86s
190
+ 2024-07-30 05:29:15,784 batch 670/672 - loss 1.97999286 - lr 0.0001 - time 344.18s
191
+ 2024-07-30 05:29:17,462 ----------------------------------------------------------------------------------------------------
192
+ 2024-07-30 05:29:17,466 EPOCH 9 DONE
193
+ 2024-07-30 05:29:36,209 TRAIN Loss: 1.9804
194
+ 2024-07-30 05:29:36,209 DEV Loss: 3.9323
195
+ 2024-07-30 05:29:36,209 DEV Perplexity: 51.0266
196
+ 2024-07-30 05:29:36,209 No improvement for 5 epoch(s)
197
+ 2024-07-30 05:29:36,209 Patience reached: Terminating model training due to early stopping
198
+ 2024-07-30 05:29:36,209 ----------------------------------------------------------------------------------------------------
199
+ 2024-07-30 05:29:36,209 Finished Training
200
+ 2024-07-30 05:30:12,775 TEST Perplexity: 48.8405
201
+ 2024-07-30 05:36:52,535 TEST BLEU = 7.09 58.3/18.9/2.1/1.1 (BP = 1.000 ratio = 1.000 hyp_len = 96 ref_len = 96)
models/en2el/character_end2end_embeddings_without_attention/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f080e5baa78aa830caec4ab51e0220caa24e85969caf265707ef05ebcd597eb0
3
+ size 35877428
models/en2el/word_end2end_embeddings_with_attention/log.txt ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-07-30 05:37:15,468 ----------------------------------------------------------------------------------------------------
2
+ 2024-07-30 05:37:15,468 Training Model
3
+ 2024-07-30 05:37:15,468 ----------------------------------------------------------------------------------------------------
4
+ 2024-07-30 05:37:15,468 Translator(
5
+ (encoder): EncoderLSTM(
6
+ (embedding): Embedding(11860, 300, padding_idx=0)
7
+ (dropout): Dropout(p=0.1, inplace=False)
8
+ (lstm): LSTM(300, 512, batch_first=True)
9
+ )
10
+ (decoder): DecoderLSTM(
11
+ (embedding): Embedding(20803, 300, padding_idx=0)
12
+ (dropout): Dropout(p=0.1, inplace=False)
13
+ (lstm): LSTM(300, 512, batch_first=True)
14
+ (attention): DotProductAttention(
15
+ (softmax): Softmax(dim=-1)
16
+ (combined2hidden): Sequential(
17
+ (0): Linear(in_features=1024, out_features=512, bias=True)
18
+ (1): ReLU()
19
+ )
20
+ )
21
+ (hidden2vocab): Linear(in_features=512, out_features=20803, bias=True)
22
+ (log_softmax): LogSoftmax(dim=-1)
23
+ )
24
+ )
25
+ 2024-07-30 05:37:15,468 ----------------------------------------------------------------------------------------------------
26
+ 2024-07-30 05:37:15,468 Training Hyperparameters:
27
+ 2024-07-30 05:37:15,468 - max_epochs: 10
28
+ 2024-07-30 05:37:15,468 - learning_rate: 0.001
29
+ 2024-07-30 05:37:15,468 - batch_size: 128
30
+ 2024-07-30 05:37:15,468 - patience: 5
31
+ 2024-07-30 05:37:15,468 - scheduler_patience: 3
32
+ 2024-07-30 05:37:15,468 - teacher_forcing_ratio: 0.5
33
+ 2024-07-30 05:37:15,468 ----------------------------------------------------------------------------------------------------
34
+ 2024-07-30 05:37:15,468 Computational Parameters:
35
+ 2024-07-30 05:37:15,468 - num_workers: 4
36
+ 2024-07-30 05:37:15,468 - device: device(type='cuda', index=0)
37
+ 2024-07-30 05:37:15,468 ----------------------------------------------------------------------------------------------------
38
+ 2024-07-30 05:37:15,468 Dataset Splits:
39
+ 2024-07-30 05:37:15,468 - train: 85949 data points
40
+ 2024-07-30 05:37:15,468 - dev: 12279 data points
41
+ 2024-07-30 05:37:15,468 - test: 24557 data points
42
+ 2024-07-30 05:37:15,468 ----------------------------------------------------------------------------------------------------
43
+ 2024-07-30 05:37:15,468 EPOCH 1
44
+ 2024-07-30 05:37:32,927 batch 67/672 - loss 7.02975038 - lr 0.0010 - time 17.46s
45
+ 2024-07-30 05:37:47,949 batch 134/672 - loss 6.71117427 - lr 0.0010 - time 32.48s
46
+ 2024-07-30 05:38:02,964 batch 201/672 - loss 6.54099928 - lr 0.0010 - time 47.50s
47
+ 2024-07-30 05:38:17,283 batch 268/672 - loss 6.41611423 - lr 0.0010 - time 61.81s
48
+ 2024-07-30 05:38:31,966 batch 335/672 - loss 6.29867061 - lr 0.0010 - time 76.50s
49
+ 2024-07-30 05:38:46,988 batch 402/672 - loss 6.18939765 - lr 0.0010 - time 91.52s
50
+ 2024-07-30 05:39:01,806 batch 469/672 - loss 6.08893062 - lr 0.0010 - time 106.34s
51
+ 2024-07-30 05:39:16,615 batch 536/672 - loss 5.99670628 - lr 0.0010 - time 121.15s
52
+ 2024-07-30 05:39:32,715 batch 603/672 - loss 5.90835549 - lr 0.0010 - time 137.25s
53
+ 2024-07-30 05:39:48,372 batch 670/672 - loss 5.82820846 - lr 0.0010 - time 152.90s
54
+ 2024-07-30 05:39:48,928 ----------------------------------------------------------------------------------------------------
55
+ 2024-07-30 05:39:48,929 EPOCH 1 DONE
56
+ 2024-07-30 05:39:55,582 TRAIN Loss: 5.8259
57
+ 2024-07-30 05:39:55,583 DEV Loss: 5.9498
58
+ 2024-07-30 05:39:55,583 DEV Perplexity: 383.6837
59
+ 2024-07-30 05:39:55,583 New best score!
60
+ 2024-07-30 05:39:55,584 ----------------------------------------------------------------------------------------------------
61
+ 2024-07-30 05:39:55,584 EPOCH 2
62
+ 2024-07-30 05:40:10,237 batch 67/672 - loss 4.94811222 - lr 0.0010 - time 14.65s
63
+ 2024-07-30 05:40:24,957 batch 134/672 - loss 4.87957455 - lr 0.0010 - time 29.37s
64
+ 2024-07-30 05:40:39,341 batch 201/672 - loss 4.83254105 - lr 0.0010 - time 43.76s
65
+ 2024-07-30 05:40:53,204 batch 268/672 - loss 4.78420364 - lr 0.0010 - time 57.62s
66
+ 2024-07-30 05:41:09,735 batch 335/672 - loss 4.73921762 - lr 0.0010 - time 74.15s
67
+ 2024-07-30 05:41:25,335 batch 402/672 - loss 4.69684552 - lr 0.0010 - time 89.75s
68
+ 2024-07-30 05:41:39,835 batch 469/672 - loss 4.66227807 - lr 0.0010 - time 104.25s
69
+ 2024-07-30 05:41:54,143 batch 536/672 - loss 4.62496828 - lr 0.0010 - time 118.56s
70
+ 2024-07-30 05:42:11,248 batch 603/672 - loss 4.59447982 - lr 0.0010 - time 135.66s
71
+ 2024-07-30 05:42:27,703 batch 670/672 - loss 4.56273915 - lr 0.0010 - time 152.12s
72
+ 2024-07-30 05:42:28,145 ----------------------------------------------------------------------------------------------------
73
+ 2024-07-30 05:42:28,146 EPOCH 2 DONE
74
+ 2024-07-30 05:42:34,823 TRAIN Loss: 4.5617
75
+ 2024-07-30 05:42:34,823 DEV Loss: 5.6678
76
+ 2024-07-30 05:42:34,824 DEV Perplexity: 289.3950
77
+ 2024-07-30 05:42:34,824 New best score!
78
+ 2024-07-30 05:42:34,825 ----------------------------------------------------------------------------------------------------
79
+ 2024-07-30 05:42:34,825 EPOCH 3
80
+ 2024-07-30 05:42:50,156 batch 67/672 - loss 4.10132092 - lr 0.0010 - time 15.33s
81
+ 2024-07-30 05:43:04,641 batch 134/672 - loss 4.07855133 - lr 0.0010 - time 29.82s
82
+ 2024-07-30 05:43:20,810 batch 201/672 - loss 4.06460287 - lr 0.0010 - time 45.99s
83
+ 2024-07-30 05:43:35,663 batch 268/672 - loss 4.04538993 - lr 0.0010 - time 60.84s
84
+ 2024-07-30 05:43:51,597 batch 335/672 - loss 4.02722402 - lr 0.0010 - time 76.77s
85
+ 2024-07-30 05:44:06,421 batch 402/672 - loss 4.01842412 - lr 0.0010 - time 91.60s
86
+ 2024-07-30 05:44:22,885 batch 469/672 - loss 4.00276648 - lr 0.0010 - time 108.06s
87
+ 2024-07-30 05:44:38,368 batch 536/672 - loss 3.98641537 - lr 0.0010 - time 123.54s
88
+ 2024-07-30 05:44:52,112 batch 603/672 - loss 3.96747647 - lr 0.0010 - time 137.29s
89
+ 2024-07-30 05:45:06,111 batch 670/672 - loss 3.95417700 - lr 0.0010 - time 151.29s
90
+ 2024-07-30 05:45:06,622 ----------------------------------------------------------------------------------------------------
91
+ 2024-07-30 05:45:06,623 EPOCH 3 DONE
92
+ 2024-07-30 05:45:13,296 TRAIN Loss: 3.9541
93
+ 2024-07-30 05:45:13,297 DEV Loss: 5.5469
94
+ 2024-07-30 05:45:13,297 DEV Perplexity: 256.4349
95
+ 2024-07-30 05:45:13,297 New best score!
96
+ 2024-07-30 05:45:13,298 ----------------------------------------------------------------------------------------------------
97
+ 2024-07-30 05:45:13,298 EPOCH 4
98
+ 2024-07-30 05:45:27,530 batch 67/672 - loss 3.58314629 - lr 0.0010 - time 14.23s
99
+ 2024-07-30 05:45:42,047 batch 134/672 - loss 3.61880102 - lr 0.0010 - time 28.75s
100
+ 2024-07-30 05:45:57,236 batch 201/672 - loss 3.59876704 - lr 0.0010 - time 43.94s
101
+ 2024-07-30 05:46:12,211 batch 268/672 - loss 3.60037185 - lr 0.0010 - time 58.91s
102
+ 2024-07-30 05:46:26,663 batch 335/672 - loss 3.59520071 - lr 0.0010 - time 73.36s
103
+ 2024-07-30 05:46:42,085 batch 402/672 - loss 3.58989806 - lr 0.0010 - time 88.79s
104
+ 2024-07-30 05:46:57,170 batch 469/672 - loss 3.59024472 - lr 0.0010 - time 103.87s
105
+ 2024-07-30 05:47:13,003 batch 536/672 - loss 3.58583019 - lr 0.0010 - time 119.70s
106
+ 2024-07-30 05:47:28,724 batch 603/672 - loss 3.58285773 - lr 0.0010 - time 135.43s
107
+ 2024-07-30 05:47:44,206 batch 670/672 - loss 3.57629914 - lr 0.0010 - time 150.91s
108
+ 2024-07-30 05:47:44,707 ----------------------------------------------------------------------------------------------------
109
+ 2024-07-30 05:47:44,708 EPOCH 4 DONE
110
+ 2024-07-30 05:47:51,341 TRAIN Loss: 3.5756
111
+ 2024-07-30 05:47:51,342 DEV Loss: 5.5492
112
+ 2024-07-30 05:47:51,342 DEV Perplexity: 257.0241
113
+ 2024-07-30 05:47:51,342 No improvement for 1 epoch(s)
114
+ 2024-07-30 05:47:51,342 ----------------------------------------------------------------------------------------------------
115
+ 2024-07-30 05:47:51,342 EPOCH 5
116
+ 2024-07-30 05:48:08,249 batch 67/672 - loss 3.34088491 - lr 0.0010 - time 16.91s
117
+ 2024-07-30 05:48:22,070 batch 134/672 - loss 3.33345445 - lr 0.0010 - time 30.73s
118
+ 2024-07-30 05:48:37,068 batch 201/672 - loss 3.34338642 - lr 0.0010 - time 45.73s
119
+ 2024-07-30 05:48:52,643 batch 268/672 - loss 3.34294628 - lr 0.0010 - time 61.30s
120
+ 2024-07-30 05:49:07,513 batch 335/672 - loss 3.34124798 - lr 0.0010 - time 76.17s
121
+ 2024-07-30 05:49:21,720 batch 402/672 - loss 3.33298490 - lr 0.0010 - time 90.38s
122
+ 2024-07-30 05:49:36,527 batch 469/672 - loss 3.33775506 - lr 0.0010 - time 105.19s
123
+ 2024-07-30 05:49:51,453 batch 536/672 - loss 3.33724512 - lr 0.0010 - time 120.11s
124
+ 2024-07-30 05:50:06,222 batch 603/672 - loss 3.33896945 - lr 0.0010 - time 134.88s
125
+ 2024-07-30 05:50:22,129 batch 670/672 - loss 3.33729344 - lr 0.0010 - time 150.79s
126
+ 2024-07-30 05:50:22,681 ----------------------------------------------------------------------------------------------------
127
+ 2024-07-30 05:50:22,682 EPOCH 5 DONE
128
+ 2024-07-30 05:50:29,337 TRAIN Loss: 3.3369
129
+ 2024-07-30 05:50:29,338 DEV Loss: 5.5888
130
+ 2024-07-30 05:50:29,338 DEV Perplexity: 267.4194
131
+ 2024-07-30 05:50:29,338 No improvement for 2 epoch(s)
132
+ 2024-07-30 05:50:29,338 ----------------------------------------------------------------------------------------------------
133
+ 2024-07-30 05:50:29,338 EPOCH 6
134
+ 2024-07-30 05:50:44,538 batch 67/672 - loss 3.09988255 - lr 0.0010 - time 15.20s
135
+ 2024-07-30 05:51:00,031 batch 134/672 - loss 3.12499664 - lr 0.0010 - time 30.69s
136
+ 2024-07-30 05:51:14,763 batch 201/672 - loss 3.14307480 - lr 0.0010 - time 45.42s
137
+ 2024-07-30 05:51:30,017 batch 268/672 - loss 3.15283256 - lr 0.0010 - time 60.68s
138
+ 2024-07-30 05:51:45,540 batch 335/672 - loss 3.14810573 - lr 0.0010 - time 76.20s
139
+ 2024-07-30 05:52:00,299 batch 402/672 - loss 3.15746469 - lr 0.0010 - time 90.96s
140
+ 2024-07-30 05:52:15,529 batch 469/672 - loss 3.15836043 - lr 0.0010 - time 106.19s
141
+ 2024-07-30 05:52:31,790 batch 536/672 - loss 3.16262923 - lr 0.0010 - time 122.45s
142
+ 2024-07-30 05:52:47,277 batch 603/672 - loss 3.16002134 - lr 0.0010 - time 137.94s
143
+ 2024-07-30 05:53:01,751 batch 670/672 - loss 3.16646054 - lr 0.0010 - time 152.41s
144
+ 2024-07-30 05:53:02,395 ----------------------------------------------------------------------------------------------------
145
+ 2024-07-30 05:53:02,396 EPOCH 6 DONE
146
+ 2024-07-30 05:53:09,115 TRAIN Loss: 3.1667
147
+ 2024-07-30 05:53:09,115 DEV Loss: 5.5071
148
+ 2024-07-30 05:53:09,115 DEV Perplexity: 246.4274
149
+ 2024-07-30 05:53:09,115 New best score!
150
+ 2024-07-30 05:53:09,116 ----------------------------------------------------------------------------------------------------
151
+ 2024-07-30 05:53:09,116 EPOCH 7
152
+ 2024-07-30 05:53:24,807 batch 67/672 - loss 2.98739854 - lr 0.0010 - time 15.69s
153
+ 2024-07-30 05:53:39,149 batch 134/672 - loss 2.98036382 - lr 0.0010 - time 30.03s
154
+ 2024-07-30 05:53:53,482 batch 201/672 - loss 2.98568483 - lr 0.0010 - time 44.37s
155
+ 2024-07-30 05:54:08,551 batch 268/672 - loss 2.99329857 - lr 0.0010 - time 59.43s
156
+ 2024-07-30 05:54:22,699 batch 335/672 - loss 3.00473778 - lr 0.0010 - time 73.58s
157
+ 2024-07-30 05:54:37,180 batch 402/672 - loss 3.01970972 - lr 0.0010 - time 88.06s
158
+ 2024-07-30 05:54:54,515 batch 469/672 - loss 3.02301556 - lr 0.0010 - time 105.40s
159
+ 2024-07-30 05:55:10,416 batch 536/672 - loss 3.02170302 - lr 0.0010 - time 121.30s
160
+ 2024-07-30 05:55:26,104 batch 603/672 - loss 3.02361924 - lr 0.0010 - time 136.99s
161
+ 2024-07-30 05:55:42,134 batch 670/672 - loss 3.02671158 - lr 0.0010 - time 153.02s
162
+ 2024-07-30 05:55:42,550 ----------------------------------------------------------------------------------------------------
163
+ 2024-07-30 05:55:42,550 EPOCH 7 DONE
164
+ 2024-07-30 05:55:49,358 TRAIN Loss: 3.0269
165
+ 2024-07-30 05:55:49,359 DEV Loss: 5.6211
166
+ 2024-07-30 05:55:49,359 DEV Perplexity: 276.2037
167
+ 2024-07-30 05:55:49,359 No improvement for 1 epoch(s)
168
+ 2024-07-30 05:55:49,359 ----------------------------------------------------------------------------------------------------
169
+ 2024-07-30 05:55:49,359 EPOCH 8
170
+ 2024-07-30 05:56:06,200 batch 67/672 - loss 2.86453795 - lr 0.0010 - time 16.84s
171
+ 2024-07-30 05:56:21,805 batch 134/672 - loss 2.86024533 - lr 0.0010 - time 32.45s
172
+ 2024-07-30 05:56:36,757 batch 201/672 - loss 2.85630216 - lr 0.0010 - time 47.40s
173
+ 2024-07-30 05:56:52,468 batch 268/672 - loss 2.86597593 - lr 0.0010 - time 63.11s
174
+ 2024-07-30 05:57:07,027 batch 335/672 - loss 2.87845792 - lr 0.0010 - time 77.67s
175
+ 2024-07-30 05:57:21,279 batch 402/672 - loss 2.88857114 - lr 0.0010 - time 91.92s
176
+ 2024-07-30 05:57:37,150 batch 469/672 - loss 2.89169808 - lr 0.0010 - time 107.79s
177
+ 2024-07-30 05:57:52,295 batch 536/672 - loss 2.89775301 - lr 0.0010 - time 122.94s
178
+ 2024-07-30 05:58:07,862 batch 603/672 - loss 2.90178569 - lr 0.0010 - time 138.50s
179
+ 2024-07-30 05:58:22,653 batch 670/672 - loss 2.90217636 - lr 0.0010 - time 153.29s
180
+ 2024-07-30 05:58:23,114 ----------------------------------------------------------------------------------------------------
181
+ 2024-07-30 05:58:23,115 EPOCH 8 DONE
182
+ 2024-07-30 05:58:29,762 TRAIN Loss: 2.9031
183
+ 2024-07-30 05:58:29,762 DEV Loss: 5.6538
184
+ 2024-07-30 05:58:29,762 DEV Perplexity: 285.3849
185
+ 2024-07-30 05:58:29,762 No improvement for 2 epoch(s)
186
+ 2024-07-30 05:58:29,762 ----------------------------------------------------------------------------------------------------
187
+ 2024-07-30 05:58:29,762 EPOCH 9
188
+ 2024-07-30 05:58:44,994 batch 67/672 - loss 2.73835637 - lr 0.0010 - time 15.23s
189
+ 2024-07-30 05:59:00,180 batch 134/672 - loss 2.73880063 - lr 0.0010 - time 30.42s
190
+ 2024-07-30 05:59:15,963 batch 201/672 - loss 2.74995850 - lr 0.0010 - time 46.20s
191
+ 2024-07-30 05:59:31,437 batch 268/672 - loss 2.76165144 - lr 0.0010 - time 61.67s
192
+ 2024-07-30 05:59:46,416 batch 335/672 - loss 2.77234203 - lr 0.0010 - time 76.65s
193
+ 2024-07-30 06:00:01,072 batch 402/672 - loss 2.77924203 - lr 0.0010 - time 91.31s
194
+ 2024-07-30 06:00:15,766 batch 469/672 - loss 2.79409497 - lr 0.0010 - time 106.00s
195
+ 2024-07-30 06:00:31,180 batch 536/672 - loss 2.79967493 - lr 0.0010 - time 121.42s
196
+ 2024-07-30 06:00:46,193 batch 603/672 - loss 2.80317950 - lr 0.0010 - time 136.43s
197
+ 2024-07-30 06:01:02,490 batch 670/672 - loss 2.80855727 - lr 0.0010 - time 152.73s
198
+ 2024-07-30 06:01:02,875 ----------------------------------------------------------------------------------------------------
199
+ 2024-07-30 06:01:02,876 EPOCH 9 DONE
200
+ 2024-07-30 06:01:09,698 TRAIN Loss: 2.8086
201
+ 2024-07-30 06:01:09,698 DEV Loss: 5.6621
202
+ 2024-07-30 06:01:09,698 DEV Perplexity: 287.7391
203
+ 2024-07-30 06:01:09,698 No improvement for 3 epoch(s)
204
+ 2024-07-30 06:01:09,698 ----------------------------------------------------------------------------------------------------
205
+ 2024-07-30 06:01:09,698 EPOCH 10
206
+ 2024-07-30 06:01:24,618 batch 67/672 - loss 2.64406736 - lr 0.0010 - time 14.92s
207
+ 2024-07-30 06:01:40,065 batch 134/672 - loss 2.67658532 - lr 0.0010 - time 30.37s
208
+ 2024-07-30 06:01:54,965 batch 201/672 - loss 2.68379090 - lr 0.0010 - time 45.27s
209
+ 2024-07-30 06:02:09,870 batch 268/672 - loss 2.68974200 - lr 0.0010 - time 60.17s
210
+ 2024-07-30 06:02:24,367 batch 335/672 - loss 2.69738334 - lr 0.0010 - time 74.67s
211
+ 2024-07-30 06:02:39,352 batch 402/672 - loss 2.71200059 - lr 0.0010 - time 89.65s
212
+ 2024-07-30 06:02:55,514 batch 469/672 - loss 2.71962268 - lr 0.0010 - time 105.82s
213
+ 2024-07-30 06:03:11,123 batch 536/672 - loss 2.72209992 - lr 0.0010 - time 121.42s
214
+ 2024-07-30 06:03:27,930 batch 603/672 - loss 2.72870156 - lr 0.0010 - time 138.23s
215
+ 2024-07-30 06:03:42,785 batch 670/672 - loss 2.73530983 - lr 0.0010 - time 153.09s
216
+ 2024-07-30 06:03:43,269 ----------------------------------------------------------------------------------------------------
217
+ 2024-07-30 06:03:43,270 EPOCH 10 DONE
218
+ 2024-07-30 06:03:49,979 TRAIN Loss: 2.7360
219
+ 2024-07-30 06:03:49,980 DEV Loss: 5.7023
220
+ 2024-07-30 06:03:49,980 DEV Perplexity: 299.5418
221
+ 2024-07-30 06:03:49,980 No improvement for 4 epoch(s)
222
+ 2024-07-30 06:03:49,980 ----------------------------------------------------------------------------------------------------
223
+ 2024-07-30 06:03:49,980 Finished Training
224
+ 2024-07-30 06:04:02,968 TEST Perplexity: 243.8509
225
+ 2024-07-30 06:09:52,994 TEST BLEU = 51.71 95.2/73.2/40.0/25.6 (BP = 1.000 ratio = 1.000 hyp_len = 42 ref_len = 42)
models/en2el/word_end2end_embeddings_with_attention/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4efec14acc680bed342d84850f74b6d5f511d35512fb19258fda4855f3310cd
3
+ size 98458344
models/en2el/word_end2end_embeddings_without_attention/log.txt ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-07-30 06:10:16,329 ----------------------------------------------------------------------------------------------------
2
+ 2024-07-30 06:10:16,329 Training Model
3
+ 2024-07-30 06:10:16,329 ----------------------------------------------------------------------------------------------------
4
+ 2024-07-30 06:10:16,329 Translator(
5
+ (encoder): EncoderLSTM(
6
+ (embedding): Embedding(11860, 300, padding_idx=0)
7
+ (dropout): Dropout(p=0.1, inplace=False)
8
+ (lstm): LSTM(300, 512, batch_first=True, bidirectional=True)
9
+ )
10
+ (decoder): DecoderLSTM(
11
+ (embedding): Embedding(20803, 300, padding_idx=0)
12
+ (dropout): Dropout(p=0.1, inplace=False)
13
+ (lstm): LSTM(300, 1024, batch_first=True)
14
+ (hidden2vocab): Linear(in_features=1024, out_features=20803, bias=True)
15
+ (log_softmax): LogSoftmax(dim=-1)
16
+ )
17
+ )
18
+ 2024-07-30 06:10:16,329 ----------------------------------------------------------------------------------------------------
19
+ 2024-07-30 06:10:16,329 Training Hyperparameters:
20
+ 2024-07-30 06:10:16,329 - max_epochs: 10
21
+ 2024-07-30 06:10:16,329 - learning_rate: 0.001
22
+ 2024-07-30 06:10:16,329 - batch_size: 128
23
+ 2024-07-30 06:10:16,329 - patience: 5
24
+ 2024-07-30 06:10:16,329 - scheduler_patience: 3
25
+ 2024-07-30 06:10:16,329 - teacher_forcing_ratio: 0.5
26
+ 2024-07-30 06:10:16,329 ----------------------------------------------------------------------------------------------------
27
+ 2024-07-30 06:10:16,329 Computational Parameters:
28
+ 2024-07-30 06:10:16,329 - num_workers: 4
29
+ 2024-07-30 06:10:16,329 - device: device(type='cuda', index=0)
30
+ 2024-07-30 06:10:16,329 ----------------------------------------------------------------------------------------------------
31
+ 2024-07-30 06:10:16,329 Dataset Splits:
32
+ 2024-07-30 06:10:16,329 - train: 85949 data points
33
+ 2024-07-30 06:10:16,329 - dev: 12279 data points
34
+ 2024-07-30 06:10:16,329 - test: 24557 data points
35
+ 2024-07-30 06:10:16,329 ----------------------------------------------------------------------------------------------------
36
+ 2024-07-30 06:10:16,329 EPOCH 1
37
+ 2024-07-30 06:10:33,528 batch 67/672 - loss 6.89134038 - lr 0.0010 - time 17.20s
38
+ 2024-07-30 06:10:49,339 batch 134/672 - loss 6.57824612 - lr 0.0010 - time 33.01s
39
+ 2024-07-30 06:11:06,164 batch 201/672 - loss 6.39062692 - lr 0.0010 - time 49.83s
40
+ 2024-07-30 06:11:23,087 batch 268/672 - loss 6.24873080 - lr 0.0010 - time 66.76s
41
+ 2024-07-30 06:11:39,206 batch 335/672 - loss 6.14091628 - lr 0.0010 - time 82.88s
42
+ 2024-07-30 06:11:55,624 batch 402/672 - loss 6.05044948 - lr 0.0010 - time 99.30s
43
+ 2024-07-30 06:12:12,408 batch 469/672 - loss 5.97115639 - lr 0.0010 - time 116.08s
44
+ 2024-07-30 06:12:29,512 batch 536/672 - loss 5.90572211 - lr 0.0010 - time 133.18s
45
+ 2024-07-30 06:12:45,926 batch 603/672 - loss 5.84390986 - lr 0.0010 - time 149.60s
46
+ 2024-07-30 06:13:02,678 batch 670/672 - loss 5.79140960 - lr 0.0010 - time 166.35s
47
+ 2024-07-30 06:13:03,229 ----------------------------------------------------------------------------------------------------
48
+ 2024-07-30 06:13:03,229 EPOCH 1 DONE
49
+ 2024-07-30 06:13:10,739 TRAIN Loss: 5.7900
50
+ 2024-07-30 06:13:10,740 DEV Loss: 6.0754
51
+ 2024-07-30 06:13:10,740 DEV Perplexity: 435.0279
52
+ 2024-07-30 06:13:10,740 New best score!
53
+ 2024-07-30 06:13:10,741 ----------------------------------------------------------------------------------------------------
54
+ 2024-07-30 06:13:10,741 EPOCH 2
55
+ 2024-07-30 06:13:27,893 batch 67/672 - loss 5.19151245 - lr 0.0010 - time 17.15s
56
+ 2024-07-30 06:13:43,749 batch 134/672 - loss 5.18056433 - lr 0.0010 - time 33.01s
57
+ 2024-07-30 06:13:59,822 batch 201/672 - loss 5.15702770 - lr 0.0010 - time 49.08s
58
+ 2024-07-30 06:14:16,310 batch 268/672 - loss 5.13158063 - lr 0.0010 - time 65.57s
59
+ 2024-07-30 06:14:32,448 batch 335/672 - loss 5.11185801 - lr 0.0010 - time 81.71s
60
+ 2024-07-30 06:14:50,717 batch 402/672 - loss 5.09909616 - lr 0.0010 - time 99.98s
61
+ 2024-07-30 06:15:06,671 batch 469/672 - loss 5.08253138 - lr 0.0010 - time 115.93s
62
+ 2024-07-30 06:15:22,440 batch 536/672 - loss 5.06342435 - lr 0.0010 - time 131.70s
63
+ 2024-07-30 06:15:39,573 batch 603/672 - loss 5.04722015 - lr 0.0010 - time 148.83s
64
+ 2024-07-30 06:15:56,587 batch 670/672 - loss 5.03041611 - lr 0.0010 - time 165.85s
65
+ 2024-07-30 06:15:57,067 ----------------------------------------------------------------------------------------------------
66
+ 2024-07-30 06:15:57,068 EPOCH 2 DONE
67
+ 2024-07-30 06:16:04,730 TRAIN Loss: 5.0304
68
+ 2024-07-30 06:16:04,730 DEV Loss: 5.8563
69
+ 2024-07-30 06:16:04,730 DEV Perplexity: 349.4377
70
+ 2024-07-30 06:16:04,730 New best score!
71
+ 2024-07-30 06:16:04,731 ----------------------------------------------------------------------------------------------------
72
+ 2024-07-30 06:16:04,731 EPOCH 3
73
+ 2024-07-30 06:16:21,939 batch 67/672 - loss 4.74854994 - lr 0.0010 - time 17.21s
74
+ 2024-07-30 06:16:39,161 batch 134/672 - loss 4.75385062 - lr 0.0010 - time 34.43s
75
+ 2024-07-30 06:16:55,909 batch 201/672 - loss 4.74590490 - lr 0.0010 - time 51.18s
76
+ 2024-07-30 06:17:13,587 batch 268/672 - loss 4.73322697 - lr 0.0010 - time 68.86s
77
+ 2024-07-30 06:17:29,628 batch 335/672 - loss 4.72623819 - lr 0.0010 - time 84.90s
78
+ 2024-07-30 06:17:45,858 batch 402/672 - loss 4.71717374 - lr 0.0010 - time 101.13s
79
+ 2024-07-30 06:18:01,683 batch 469/672 - loss 4.71031598 - lr 0.0010 - time 116.95s
80
+ 2024-07-30 06:18:17,857 batch 536/672 - loss 4.70273496 - lr 0.0010 - time 133.13s
81
+ 2024-07-30 06:18:34,491 batch 603/672 - loss 4.69273729 - lr 0.0010 - time 149.76s
82
+ 2024-07-30 06:18:51,131 batch 670/672 - loss 4.68461839 - lr 0.0010 - time 166.40s
83
+ 2024-07-30 06:18:51,699 ----------------------------------------------------------------------------------------------------
84
+ 2024-07-30 06:18:51,700 EPOCH 3 DONE
85
+ 2024-07-30 06:18:59,286 TRAIN Loss: 4.6841
86
+ 2024-07-30 06:18:59,286 DEV Loss: 5.7668
87
+ 2024-07-30 06:18:59,286 DEV Perplexity: 319.5288
88
+ 2024-07-30 06:18:59,286 New best score!
89
+ 2024-07-30 06:18:59,288 ----------------------------------------------------------------------------------------------------
90
+ 2024-07-30 06:18:59,288 EPOCH 4
91
+ 2024-07-30 06:19:16,040 batch 67/672 - loss 4.42943953 - lr 0.0010 - time 16.75s
92
+ 2024-07-30 06:19:32,563 batch 134/672 - loss 4.44308816 - lr 0.0010 - time 33.28s
93
+ 2024-07-30 06:19:48,771 batch 201/672 - loss 4.43924424 - lr 0.0010 - time 49.48s
94
+ 2024-07-30 06:20:06,113 batch 268/672 - loss 4.43592648 - lr 0.0010 - time 66.83s
95
+ 2024-07-30 06:20:22,811 batch 335/672 - loss 4.43285133 - lr 0.0010 - time 83.52s
96
+ 2024-07-30 06:20:39,538 batch 402/672 - loss 4.42675229 - lr 0.0010 - time 100.25s
97
+ 2024-07-30 06:20:56,393 batch 469/672 - loss 4.42445208 - lr 0.0010 - time 117.11s
98
+ 2024-07-30 06:21:12,664 batch 536/672 - loss 4.42030223 - lr 0.0010 - time 133.38s
99
+ 2024-07-30 06:21:29,058 batch 603/672 - loss 4.41070776 - lr 0.0010 - time 149.77s
100
+ 2024-07-30 06:21:45,786 batch 670/672 - loss 4.40933425 - lr 0.0010 - time 166.50s
101
+ 2024-07-30 06:21:46,351 ----------------------------------------------------------------------------------------------------
102
+ 2024-07-30 06:21:46,352 EPOCH 4 DONE
103
+ 2024-07-30 06:21:53,967 TRAIN Loss: 4.4087
104
+ 2024-07-30 06:21:53,968 DEV Loss: 5.6346
105
+ 2024-07-30 06:21:53,968 DEV Perplexity: 279.9447
106
+ 2024-07-30 06:21:53,968 New best score!
107
+ 2024-07-30 06:21:53,969 ----------------------------------------------------------------------------------------------------
108
+ 2024-07-30 06:21:53,969 EPOCH 5
109
+ 2024-07-30 06:22:10,169 batch 67/672 - loss 4.15659056 - lr 0.0010 - time 16.20s
110
+ 2024-07-30 06:22:27,143 batch 134/672 - loss 4.15781051 - lr 0.0010 - time 33.17s
111
+ 2024-07-30 06:22:43,863 batch 201/672 - loss 4.17910115 - lr 0.0010 - time 49.89s
112
+ 2024-07-30 06:23:01,372 batch 268/672 - loss 4.18044725 - lr 0.0010 - time 67.40s
113
+ 2024-07-30 06:23:17,134 batch 335/672 - loss 4.18468309 - lr 0.0010 - time 83.16s
114
+ 2024-07-30 06:23:32,623 batch 402/672 - loss 4.18159520 - lr 0.0010 - time 98.65s
115
+ 2024-07-30 06:23:50,102 batch 469/672 - loss 4.19281469 - lr 0.0010 - time 116.13s
116
+ 2024-07-30 06:24:06,966 batch 536/672 - loss 4.19381198 - lr 0.0010 - time 133.00s
117
+ 2024-07-30 06:24:23,593 batch 603/672 - loss 4.19385485 - lr 0.0010 - time 149.62s
118
+ 2024-07-30 06:24:40,339 batch 670/672 - loss 4.19408414 - lr 0.0010 - time 166.37s
119
+ 2024-07-30 06:24:40,879 ----------------------------------------------------------------------------------------------------
120
+ 2024-07-30 06:24:40,879 EPOCH 5 DONE
121
+ 2024-07-30 06:24:48,451 TRAIN Loss: 4.1944
122
+ 2024-07-30 06:24:48,452 DEV Loss: 5.6667
123
+ 2024-07-30 06:24:48,452 DEV Perplexity: 289.0791
124
+ 2024-07-30 06:24:48,452 No improvement for 1 epoch(s)
125
+ 2024-07-30 06:24:48,452 ----------------------------------------------------------------------------------------------------
126
+ 2024-07-30 06:24:48,452 EPOCH 6
127
+ 2024-07-30 06:25:04,742 batch 67/672 - loss 3.95974011 - lr 0.0010 - time 16.29s
128
+ 2024-07-30 06:25:22,347 batch 134/672 - loss 3.97261752 - lr 0.0010 - time 33.89s
129
+ 2024-07-30 06:25:39,265 batch 201/672 - loss 3.97632658 - lr 0.0010 - time 50.81s
130
+ 2024-07-30 06:25:55,429 batch 268/672 - loss 3.97863425 - lr 0.0010 - time 66.98s
131
+ 2024-07-30 06:26:12,245 batch 335/672 - loss 3.98592581 - lr 0.0010 - time 83.79s
132
+ 2024-07-30 06:26:29,089 batch 402/672 - loss 3.99840168 - lr 0.0010 - time 100.64s
133
+ 2024-07-30 06:26:45,261 batch 469/672 - loss 4.00143650 - lr 0.0010 - time 116.81s
134
+ 2024-07-30 06:27:01,678 batch 536/672 - loss 4.00356728 - lr 0.0010 - time 133.23s
135
+ 2024-07-30 06:27:19,333 batch 603/672 - loss 4.00464411 - lr 0.0010 - time 150.88s
136
+ 2024-07-30 06:27:35,391 batch 670/672 - loss 4.00861603 - lr 0.0010 - time 166.94s
137
+ 2024-07-30 06:27:35,850 ----------------------------------------------------------------------------------------------------
138
+ 2024-07-30 06:27:35,850 EPOCH 6 DONE
139
+ 2024-07-30 06:27:43,578 TRAIN Loss: 4.0089
140
+ 2024-07-30 06:27:43,579 DEV Loss: 5.6235
141
+ 2024-07-30 06:27:43,579 DEV Perplexity: 276.8608
142
+ 2024-07-30 06:27:43,579 New best score!
143
+ 2024-07-30 06:27:43,580 ----------------------------------------------------------------------------------------------------
144
+ 2024-07-30 06:27:43,580 EPOCH 7
145
+ 2024-07-30 06:28:00,991 batch 67/672 - loss 3.81231925 - lr 0.0010 - time 17.41s
146
+ 2024-07-30 06:28:17,705 batch 134/672 - loss 3.81504109 - lr 0.0010 - time 34.12s
147
+ 2024-07-30 06:28:33,607 batch 201/672 - loss 3.81452682 - lr 0.0010 - time 50.03s
148
+ 2024-07-30 06:28:49,615 batch 268/672 - loss 3.82829350 - lr 0.0010 - time 66.03s
149
+ 2024-07-30 06:29:07,145 batch 335/672 - loss 3.82877066 - lr 0.0010 - time 83.57s
150
+ 2024-07-30 06:29:24,783 batch 402/672 - loss 3.84061924 - lr 0.0010 - time 101.20s
151
+ 2024-07-30 06:29:41,012 batch 469/672 - loss 3.84848637 - lr 0.0010 - time 117.43s
152
+ 2024-07-30 06:29:57,685 batch 536/672 - loss 3.85565654 - lr 0.0010 - time 134.11s
153
+ 2024-07-30 06:30:14,157 batch 603/672 - loss 3.85682274 - lr 0.0010 - time 150.58s
154
+ 2024-07-30 06:30:30,979 batch 670/672 - loss 3.86048785 - lr 0.0010 - time 167.40s
155
+ 2024-07-30 06:30:31,554 ----------------------------------------------------------------------------------------------------
156
+ 2024-07-30 06:30:31,555 EPOCH 7 DONE
157
+ 2024-07-30 06:30:39,002 TRAIN Loss: 3.8607
158
+ 2024-07-30 06:30:39,002 DEV Loss: 5.6687
159
+ 2024-07-30 06:30:39,003 DEV Perplexity: 289.6511
160
+ 2024-07-30 06:30:39,003 No improvement for 1 epoch(s)
161
+ 2024-07-30 06:30:39,003 ----------------------------------------------------------------------------------------------------
162
+ 2024-07-30 06:30:39,003 EPOCH 8
163
+ 2024-07-30 06:30:54,932 batch 67/672 - loss 3.64059773 - lr 0.0010 - time 15.93s
164
+ 2024-07-30 06:31:12,086 batch 134/672 - loss 3.64781523 - lr 0.0010 - time 33.08s
165
+ 2024-07-30 06:31:29,655 batch 201/672 - loss 3.65689145 - lr 0.0010 - time 50.65s
166
+ 2024-07-30 06:31:46,104 batch 268/672 - loss 3.66917406 - lr 0.0010 - time 67.10s
167
+ 2024-07-30 06:32:02,574 batch 335/672 - loss 3.68868699 - lr 0.0010 - time 83.57s
168
+ 2024-07-30 06:32:20,494 batch 402/672 - loss 3.69485565 - lr 0.0010 - time 101.49s
169
+ 2024-07-30 06:32:37,384 batch 469/672 - loss 3.70089396 - lr 0.0010 - time 118.38s
170
+ 2024-07-30 06:32:53,682 batch 536/672 - loss 3.70896337 - lr 0.0010 - time 134.68s
171
+ 2024-07-30 06:33:10,006 batch 603/672 - loss 3.70843371 - lr 0.0010 - time 151.00s
172
+ 2024-07-30 06:33:26,025 batch 670/672 - loss 3.71478230 - lr 0.0010 - time 167.02s
173
+ 2024-07-30 06:33:26,568 ----------------------------------------------------------------------------------------------------
174
+ 2024-07-30 06:33:26,569 EPOCH 8 DONE
175
+ 2024-07-30 06:33:34,155 TRAIN Loss: 3.7152
176
+ 2024-07-30 06:33:34,155 DEV Loss: 5.6872
177
+ 2024-07-30 06:33:34,155 DEV Perplexity: 295.0647
178
+ 2024-07-30 06:33:34,155 No improvement for 2 epoch(s)
179
+ 2024-07-30 06:33:34,155 ----------------------------------------------------------------------------------------------------
180
+ 2024-07-30 06:33:34,155 EPOCH 9
181
+ 2024-07-30 06:33:50,946 batch 67/672 - loss 3.49033382 - lr 0.0010 - time 16.79s
182
+ 2024-07-30 06:34:07,073 batch 134/672 - loss 3.52263643 - lr 0.0010 - time 32.92s
183
+ 2024-07-30 06:34:24,749 batch 201/672 - loss 3.52862982 - lr 0.0010 - time 50.59s
184
+ 2024-07-30 06:34:41,585 batch 268/672 - loss 3.52733515 - lr 0.0010 - time 67.43s
185
+ 2024-07-30 06:34:57,441 batch 335/672 - loss 3.53613153 - lr 0.0010 - time 83.29s
186
+ 2024-07-30 06:35:14,069 batch 402/672 - loss 3.54259493 - lr 0.0010 - time 99.91s
187
+ 2024-07-30 06:35:31,517 batch 469/672 - loss 3.55581814 - lr 0.0010 - time 117.36s
188
+ 2024-07-30 06:35:48,805 batch 536/672 - loss 3.56197018 - lr 0.0010 - time 134.65s
189
+ 2024-07-30 06:36:05,484 batch 603/672 - loss 3.56765597 - lr 0.0010 - time 151.33s
190
+ 2024-07-30 06:36:21,946 batch 670/672 - loss 3.57076058 - lr 0.0010 - time 167.79s
191
+ 2024-07-30 06:36:22,568 ----------------------------------------------------------------------------------------------------
192
+ 2024-07-30 06:36:22,568 EPOCH 9 DONE
193
+ 2024-07-30 06:36:30,201 TRAIN Loss: 3.5709
194
+ 2024-07-30 06:36:30,201 DEV Loss: 5.7353
195
+ 2024-07-30 06:36:30,201 DEV Perplexity: 309.6049
196
+ 2024-07-30 06:36:30,201 No improvement for 3 epoch(s)
197
+ 2024-07-30 06:36:30,201 ----------------------------------------------------------------------------------------------------
198
+ 2024-07-30 06:36:30,201 EPOCH 10
199
+ 2024-07-30 06:36:48,774 batch 67/672 - loss 3.41243390 - lr 0.0010 - time 18.57s
200
+ 2024-07-30 06:37:04,497 batch 134/672 - loss 3.42528026 - lr 0.0010 - time 34.30s
201
+ 2024-07-30 06:37:21,278 batch 201/672 - loss 3.43054076 - lr 0.0010 - time 51.08s
202
+ 2024-07-30 06:37:37,475 batch 268/672 - loss 3.43839387 - lr 0.0010 - time 67.27s
203
+ 2024-07-30 06:37:53,937 batch 335/672 - loss 3.44185624 - lr 0.0010 - time 83.74s
204
+ 2024-07-30 06:38:10,633 batch 402/672 - loss 3.45245039 - lr 0.0010 - time 100.43s
205
+ 2024-07-30 06:38:27,071 batch 469/672 - loss 3.46049836 - lr 0.0010 - time 116.87s
206
+ 2024-07-30 06:38:44,265 batch 536/672 - loss 3.46732673 - lr 0.0010 - time 134.06s
207
+ 2024-07-30 06:39:00,917 batch 603/672 - loss 3.47291412 - lr 0.0010 - time 150.72s
208
+ 2024-07-30 06:39:17,468 batch 670/672 - loss 3.47771312 - lr 0.0010 - time 167.27s
209
+ 2024-07-30 06:39:17,989 ----------------------------------------------------------------------------------------------------
210
+ 2024-07-30 06:39:17,989 EPOCH 10 DONE
211
+ 2024-07-30 06:39:25,310 TRAIN Loss: 3.4781
212
+ 2024-07-30 06:39:25,310 DEV Loss: 5.7133
213
+ 2024-07-30 06:39:25,310 DEV Perplexity: 302.8638
214
+ 2024-07-30 06:39:25,310 No improvement for 4 epoch(s)
215
+ 2024-07-30 06:39:25,310 ----------------------------------------------------------------------------------------------------
216
+ 2024-07-30 06:39:25,311 Finished Training
217
+ 2024-07-30 06:39:39,794 TEST Perplexity: 278.1948
218
+ 2024-07-30 06:42:09,660 TEST BLEU = 20.94 78.2/53.0/15.2/3.1 (BP = 1.000 ratio = 1.000 hyp_len = 101 ref_len = 101)
models/en2el/word_end2end_embeddings_without_attention/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1afe67af97b81121d92af0477f53545172454066ecadffcad7df3d378912de0
3
+ size 160688628
models/en2el/word_word2vec_embeddings_with_attention/log.txt ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-07-30 06:42:19,036 ----------------------------------------------------------------------------------------------------
2
+ 2024-07-30 06:42:19,036 Training Model
3
+ 2024-07-30 06:42:19,036 ----------------------------------------------------------------------------------------------------
4
+ 2024-07-30 06:42:19,036 Translator(
5
+ (encoder): EncoderLSTM(
6
+ (embedding): Embedding(13968, 300, padding_idx=13963)
7
+ (dropout): Dropout(p=0.1, inplace=False)
8
+ (lstm): LSTM(300, 512, batch_first=True)
9
+ )
10
+ (decoder): DecoderLSTM(
11
+ (embedding): Embedding(20803, 300, padding_idx=20798)
12
+ (dropout): Dropout(p=0.1, inplace=False)
13
+ (lstm): LSTM(300, 512, batch_first=True)
14
+ (attention): DotProductAttention(
15
+ (softmax): Softmax(dim=-1)
16
+ (combined2hidden): Sequential(
17
+ (0): Linear(in_features=1024, out_features=512, bias=True)
18
+ (1): ReLU()
19
+ )
20
+ )
21
+ (hidden2vocab): Linear(in_features=512, out_features=20803, bias=True)
22
+ (log_softmax): LogSoftmax(dim=-1)
23
+ )
24
+ )
25
+ 2024-07-30 06:42:19,036 ----------------------------------------------------------------------------------------------------
26
+ 2024-07-30 06:42:19,036 Training Hyperparameters:
27
+ 2024-07-30 06:42:19,036 - max_epochs: 10
28
+ 2024-07-30 06:42:19,036 - learning_rate: 0.001
29
+ 2024-07-30 06:42:19,036 - batch_size: 128
30
+ 2024-07-30 06:42:19,036 - patience: 5
31
+ 2024-07-30 06:42:19,036 - scheduler_patience: 3
32
+ 2024-07-30 06:42:19,036 - teacher_forcing_ratio: 0.5
33
+ 2024-07-30 06:42:19,036 ----------------------------------------------------------------------------------------------------
34
+ 2024-07-30 06:42:19,036 Computational Parameters:
35
+ 2024-07-30 06:42:19,036 - num_workers: 4
36
+ 2024-07-30 06:42:19,036 - device: device(type='cuda', index=0)
37
+ 2024-07-30 06:42:19,036 ----------------------------------------------------------------------------------------------------
38
+ 2024-07-30 06:42:19,036 Dataset Splits:
39
+ 2024-07-30 06:42:19,036 - train: 85949 data points
40
+ 2024-07-30 06:42:19,036 - dev: 12279 data points
41
+ 2024-07-30 06:42:19,036 - test: 24557 data points
42
+ 2024-07-30 06:42:19,036 ----------------------------------------------------------------------------------------------------
43
+ 2024-07-30 06:42:19,036 EPOCH 1
44
+ 2024-07-30 06:42:32,600 batch 67/672 - loss 6.91662250 - lr 0.0010 - time 13.56s
45
+ 2024-07-30 06:42:46,850 batch 134/672 - loss 6.58639020 - lr 0.0010 - time 27.81s
46
+ 2024-07-30 06:43:03,155 batch 201/672 - loss 6.37018734 - lr 0.0010 - time 44.12s
47
+ 2024-07-30 06:43:17,069 batch 268/672 - loss 6.17948367 - lr 0.0010 - time 58.03s
48
+ 2024-07-30 06:43:30,601 batch 335/672 - loss 6.02470564 - lr 0.0010 - time 71.56s
49
+ 2024-07-30 06:43:44,107 batch 402/672 - loss 5.87935876 - lr 0.0010 - time 85.07s
50
+ 2024-07-30 06:43:57,997 batch 469/672 - loss 5.74927697 - lr 0.0010 - time 98.96s
51
+ 2024-07-30 06:44:11,599 batch 536/672 - loss 5.63319499 - lr 0.0010 - time 112.56s
52
+ 2024-07-30 06:44:25,891 batch 603/672 - loss 5.52629155 - lr 0.0010 - time 126.85s
53
+ 2024-07-30 06:44:39,789 batch 670/672 - loss 5.42313408 - lr 0.0010 - time 140.75s
54
+ 2024-07-30 06:44:40,240 ----------------------------------------------------------------------------------------------------
55
+ 2024-07-30 06:44:40,241 EPOCH 1 DONE
56
+ 2024-07-30 06:44:47,014 TRAIN Loss: 5.4211
57
+ 2024-07-30 06:44:47,014 DEV Loss: 5.9553
58
+ 2024-07-30 06:44:47,014 DEV Perplexity: 385.7776
59
+ 2024-07-30 06:44:47,014 New best score!
60
+ 2024-07-30 06:44:47,016 ----------------------------------------------------------------------------------------------------
61
+ 2024-07-30 06:44:47,016 EPOCH 2
62
+ 2024-07-30 06:45:01,330 batch 67/672 - loss 4.34568167 - lr 0.0010 - time 14.31s
63
+ 2024-07-30 06:45:14,682 batch 134/672 - loss 4.30953125 - lr 0.0010 - time 27.67s
64
+ 2024-07-30 06:45:28,887 batch 201/672 - loss 4.26858230 - lr 0.0010 - time 41.87s
65
+ 2024-07-30 06:45:42,207 batch 268/672 - loss 4.21925693 - lr 0.0010 - time 55.19s
66
+ 2024-07-30 06:45:56,919 batch 335/672 - loss 4.17999060 - lr 0.0010 - time 69.90s
67
+ 2024-07-30 06:46:11,287 batch 402/672 - loss 4.14158491 - lr 0.0010 - time 84.27s
68
+ 2024-07-30 06:46:25,173 batch 469/672 - loss 4.11603831 - lr 0.0010 - time 98.16s
69
+ 2024-07-30 06:46:39,445 batch 536/672 - loss 4.08914409 - lr 0.0010 - time 112.43s
70
+ 2024-07-30 06:46:53,748 batch 603/672 - loss 4.06401210 - lr 0.0010 - time 126.73s
71
+ 2024-07-30 06:47:07,806 batch 670/672 - loss 4.03871270 - lr 0.0010 - time 140.79s
72
+ 2024-07-30 06:47:08,334 ----------------------------------------------------------------------------------------------------
73
+ 2024-07-30 06:47:08,334 EPOCH 2 DONE
74
+ 2024-07-30 06:47:15,082 TRAIN Loss: 4.0379
75
+ 2024-07-30 06:47:15,082 DEV Loss: 5.7967
76
+ 2024-07-30 06:47:15,082 DEV Perplexity: 329.2231
77
+ 2024-07-30 06:47:15,082 New best score!
78
+ 2024-07-30 06:47:15,083 ----------------------------------------------------------------------------------------------------
79
+ 2024-07-30 06:47:15,083 EPOCH 3
80
+ 2024-07-30 06:47:28,509 batch 67/672 - loss 3.69339770 - lr 0.0010 - time 13.43s
81
+ 2024-07-30 06:47:42,923 batch 134/672 - loss 3.65598298 - lr 0.0010 - time 27.84s
82
+ 2024-07-30 06:47:58,031 batch 201/672 - loss 3.64756035 - lr 0.0010 - time 42.95s
83
+ 2024-07-30 06:48:12,009 batch 268/672 - loss 3.63135578 - lr 0.0010 - time 56.93s
84
+ 2024-07-30 06:48:27,330 batch 335/672 - loss 3.62346413 - lr 0.0010 - time 72.25s
85
+ 2024-07-30 06:48:40,553 batch 402/672 - loss 3.61437516 - lr 0.0010 - time 85.47s
86
+ 2024-07-30 06:48:54,137 batch 469/672 - loss 3.60578866 - lr 0.0010 - time 99.05s
87
+ 2024-07-30 06:49:07,565 batch 536/672 - loss 3.59678630 - lr 0.0010 - time 112.48s
88
+ 2024-07-30 06:49:22,231 batch 603/672 - loss 3.58649624 - lr 0.0010 - time 127.15s
89
+ 2024-07-30 06:49:35,514 batch 670/672 - loss 3.57849735 - lr 0.0010 - time 140.43s
90
+ 2024-07-30 06:49:35,960 ----------------------------------------------------------------------------------------------------
91
+ 2024-07-30 06:49:35,960 EPOCH 3 DONE
92
+ 2024-07-30 06:49:42,713 TRAIN Loss: 3.5782
93
+ 2024-07-30 06:49:42,714 DEV Loss: 5.7040
94
+ 2024-07-30 06:49:42,714 DEV Perplexity: 300.0733
95
+ 2024-07-30 06:49:42,714 New best score!
96
+ 2024-07-30 06:49:42,715 ----------------------------------------------------------------------------------------------------
97
+ 2024-07-30 06:49:42,715 EPOCH 4
98
+ 2024-07-30 06:49:57,171 batch 67/672 - loss 3.33502615 - lr 0.0010 - time 14.46s
99
+ 2024-07-30 06:50:12,312 batch 134/672 - loss 3.32770498 - lr 0.0010 - time 29.60s
100
+ 2024-07-30 06:50:26,724 batch 201/672 - loss 3.33591669 - lr 0.0010 - time 44.01s
101
+ 2024-07-30 06:50:41,286 batch 268/672 - loss 3.33694653 - lr 0.0010 - time 58.57s
102
+ 2024-07-30 06:50:56,556 batch 335/672 - loss 3.34340809 - lr 0.0010 - time 73.84s
103
+ 2024-07-30 06:51:10,394 batch 402/672 - loss 3.33700332 - lr 0.0010 - time 87.68s
104
+ 2024-07-30 06:51:24,178 batch 469/672 - loss 3.33116489 - lr 0.0010 - time 101.46s
105
+ 2024-07-30 06:51:37,693 batch 536/672 - loss 3.32581970 - lr 0.0010 - time 114.98s
106
+ 2024-07-30 06:51:50,988 batch 603/672 - loss 3.32199001 - lr 0.0010 - time 128.27s
107
+ 2024-07-30 06:52:04,369 batch 670/672 - loss 3.31883023 - lr 0.0010 - time 141.65s
108
+ 2024-07-30 06:52:04,835 ----------------------------------------------------------------------------------------------------
109
+ 2024-07-30 06:52:04,836 EPOCH 4 DONE
110
+ 2024-07-30 06:52:11,660 TRAIN Loss: 3.3192
111
+ 2024-07-30 06:52:11,661 DEV Loss: 5.7777
112
+ 2024-07-30 06:52:11,661 DEV Perplexity: 323.0131
113
+ 2024-07-30 06:52:11,661 No improvement for 1 epoch(s)
114
+ 2024-07-30 06:52:11,661 ----------------------------------------------------------------------------------------------------
115
+ 2024-07-30 06:52:11,661 EPOCH 5
116
+ 2024-07-30 06:52:25,914 batch 67/672 - loss 3.11046116 - lr 0.0010 - time 14.25s
117
+ 2024-07-30 06:52:39,686 batch 134/672 - loss 3.13300501 - lr 0.0010 - time 28.02s
118
+ 2024-07-30 06:52:54,769 batch 201/672 - loss 3.14832687 - lr 0.0010 - time 43.11s
119
+ 2024-07-30 06:53:08,273 batch 268/672 - loss 3.14089799 - lr 0.0010 - time 56.61s
120
+ 2024-07-30 06:53:22,345 batch 335/672 - loss 3.13774454 - lr 0.0010 - time 70.68s
121
+ 2024-07-30 06:53:35,557 batch 402/672 - loss 3.13777374 - lr 0.0010 - time 83.90s
122
+ 2024-07-30 06:53:50,088 batch 469/672 - loss 3.15051001 - lr 0.0010 - time 98.43s
123
+ 2024-07-30 06:54:04,119 batch 536/672 - loss 3.15820549 - lr 0.0010 - time 112.46s
124
+ 2024-07-30 06:54:18,606 batch 603/672 - loss 3.16131139 - lr 0.0010 - time 126.94s
125
+ 2024-07-30 06:54:31,805 batch 670/672 - loss 3.16425462 - lr 0.0010 - time 140.14s
126
+ 2024-07-30 06:54:32,239 ----------------------------------------------------------------------------------------------------
127
+ 2024-07-30 06:54:32,240 EPOCH 5 DONE
128
+ 2024-07-30 06:54:39,039 TRAIN Loss: 3.1636
129
+ 2024-07-30 06:54:39,040 DEV Loss: 5.6186
130
+ 2024-07-30 06:54:39,040 DEV Perplexity: 275.5058
131
+ 2024-07-30 06:54:39,040 New best score!
132
+ 2024-07-30 06:54:39,041 ----------------------------------------------------------------------------------------------------
133
+ 2024-07-30 06:54:39,042 EPOCH 6
134
+ 2024-07-30 06:54:53,590 batch 67/672 - loss 2.99883252 - lr 0.0010 - time 14.55s
135
+ 2024-07-30 06:55:07,958 batch 134/672 - loss 2.99964157 - lr 0.0010 - time 28.92s
136
+ 2024-07-30 06:55:21,606 batch 201/672 - loss 3.00070762 - lr 0.0010 - time 42.56s
137
+ 2024-07-30 06:55:34,927 batch 268/672 - loss 3.01043465 - lr 0.0010 - time 55.89s
138
+ 2024-07-30 06:55:49,908 batch 335/672 - loss 3.01090882 - lr 0.0010 - time 70.87s
139
+ 2024-07-30 06:56:05,512 batch 402/672 - loss 3.02350225 - lr 0.0010 - time 86.47s
140
+ 2024-07-30 06:56:19,152 batch 469/672 - loss 3.02613424 - lr 0.0010 - time 100.11s
141
+ 2024-07-30 06:56:32,073 batch 536/672 - loss 3.02768054 - lr 0.0010 - time 113.03s
142
+ 2024-07-30 06:56:45,343 batch 603/672 - loss 3.02796574 - lr 0.0010 - time 126.30s
143
+ 2024-07-30 06:56:59,269 batch 670/672 - loss 3.03443815 - lr 0.0010 - time 140.23s
144
+ 2024-07-30 06:56:59,740 ----------------------------------------------------------------------------------------------------
145
+ 2024-07-30 06:56:59,741 EPOCH 6 DONE
146
+ 2024-07-30 06:57:06,691 TRAIN Loss: 3.0338
147
+ 2024-07-30 06:57:06,691 DEV Loss: 5.6732
148
+ 2024-07-30 06:57:06,691 DEV Perplexity: 290.9648
149
+ 2024-07-30 06:57:06,691 No improvement for 1 epoch(s)
150
+ 2024-07-30 06:57:06,691 ----------------------------------------------------------------------------------------------------
151
+ 2024-07-30 06:57:06,691 EPOCH 7
152
+ 2024-07-30 06:57:21,541 batch 67/672 - loss 2.92923880 - lr 0.0010 - time 14.85s
153
+ 2024-07-30 06:57:36,336 batch 134/672 - loss 2.91138855 - lr 0.0010 - time 29.64s
154
+ 2024-07-30 06:57:50,883 batch 201/672 - loss 2.90436241 - lr 0.0010 - time 44.19s
155
+ 2024-07-30 06:58:04,396 batch 268/672 - loss 2.90658969 - lr 0.0010 - time 57.70s
156
+ 2024-07-30 06:58:18,863 batch 335/672 - loss 2.90434964 - lr 0.0010 - time 72.17s
157
+ 2024-07-30 06:58:32,776 batch 402/672 - loss 2.91041986 - lr 0.0010 - time 86.08s
158
+ 2024-07-30 06:58:46,520 batch 469/672 - loss 2.91567728 - lr 0.0010 - time 99.83s
159
+ 2024-07-30 06:59:00,192 batch 536/672 - loss 2.92089640 - lr 0.0010 - time 113.50s
160
+ 2024-07-30 06:59:13,775 batch 603/672 - loss 2.92147438 - lr 0.0010 - time 127.08s
161
+ 2024-07-30 06:59:28,109 batch 670/672 - loss 2.92499178 - lr 0.0010 - time 141.42s
162
+ 2024-07-30 06:59:28,618 ----------------------------------------------------------------------------------------------------
163
+ 2024-07-30 06:59:28,618 EPOCH 7 DONE
164
+ 2024-07-30 06:59:35,415 TRAIN Loss: 2.9248
165
+ 2024-07-30 06:59:35,416 DEV Loss: 5.7527
166
+ 2024-07-30 06:59:35,416 DEV Perplexity: 315.0358
167
+ 2024-07-30 06:59:35,416 No improvement for 2 epoch(s)
168
+ 2024-07-30 06:59:35,416 ----------------------------------------------------------------------------------------------------
169
+ 2024-07-30 06:59:35,416 EPOCH 8
170
+ 2024-07-30 06:59:49,950 batch 67/672 - loss 2.79975783 - lr 0.0010 - time 14.53s
171
+ 2024-07-30 07:00:03,000 batch 134/672 - loss 2.82621081 - lr 0.0010 - time 27.58s
172
+ 2024-07-30 07:00:16,861 batch 201/672 - loss 2.80679128 - lr 0.0010 - time 41.44s
173
+ 2024-07-30 07:00:30,944 batch 268/672 - loss 2.81276777 - lr 0.0010 - time 55.53s
174
+ 2024-07-30 07:00:43,886 batch 335/672 - loss 2.82274042 - lr 0.0010 - time 68.47s
175
+ 2024-07-30 07:00:58,613 batch 402/672 - loss 2.82992167 - lr 0.0010 - time 83.20s
176
+ 2024-07-30 07:01:12,876 batch 469/672 - loss 2.83434656 - lr 0.0010 - time 97.46s
177
+ 2024-07-30 07:01:28,459 batch 536/672 - loss 2.83759681 - lr 0.0010 - time 113.04s
178
+ 2024-07-30 07:01:42,183 batch 603/672 - loss 2.83786871 - lr 0.0010 - time 126.77s
179
+ 2024-07-30 07:01:56,224 batch 670/672 - loss 2.83878349 - lr 0.0010 - time 140.81s
180
+ 2024-07-30 07:01:56,659 ----------------------------------------------------------------------------------------------------
181
+ 2024-07-30 07:01:56,659 EPOCH 8 DONE
182
+ 2024-07-30 07:02:03,544 TRAIN Loss: 2.8392
183
+ 2024-07-30 07:02:03,544 DEV Loss: 5.8149
184
+ 2024-07-30 07:02:03,544 DEV Perplexity: 335.2611
185
+ 2024-07-30 07:02:03,544 No improvement for 3 epoch(s)
186
+ 2024-07-30 07:02:03,544 ----------------------------------------------------------------------------------------------------
187
+ 2024-07-30 07:02:03,544 EPOCH 9
188
+ 2024-07-30 07:02:18,195 batch 67/672 - loss 2.69513035 - lr 0.0010 - time 14.65s
189
+ 2024-07-30 07:02:32,643 batch 134/672 - loss 2.71378022 - lr 0.0010 - time 29.10s
190
+ 2024-07-30 07:02:47,249 batch 201/672 - loss 2.72227977 - lr 0.0010 - time 43.70s
191
+ 2024-07-30 07:03:00,620 batch 268/672 - loss 2.72058778 - lr 0.0010 - time 57.08s
192
+ 2024-07-30 07:03:14,774 batch 335/672 - loss 2.73157986 - lr 0.0010 - time 71.23s
193
+ 2024-07-30 07:03:28,485 batch 402/672 - loss 2.73037312 - lr 0.0010 - time 84.94s
194
+ 2024-07-30 07:03:41,666 batch 469/672 - loss 2.74071470 - lr 0.0010 - time 98.12s
195
+ 2024-07-30 07:03:56,140 batch 536/672 - loss 2.74601518 - lr 0.0010 - time 112.60s
196
+ 2024-07-30 07:04:09,579 batch 603/672 - loss 2.75315223 - lr 0.0010 - time 126.03s
197
+ 2024-07-30 07:04:24,772 batch 670/672 - loss 2.75969398 - lr 0.0010 - time 141.23s
198
+ 2024-07-30 07:04:25,239 ----------------------------------------------------------------------------------------------------
199
+ 2024-07-30 07:04:25,240 EPOCH 9 DONE
200
+ 2024-07-30 07:04:32,150 TRAIN Loss: 2.7603
201
+ 2024-07-30 07:04:32,151 DEV Loss: 5.7953
202
+ 2024-07-30 07:04:32,151 DEV Perplexity: 328.7462
203
+ 2024-07-30 07:04:32,151 No improvement for 4 epoch(s)
204
+ 2024-07-30 07:04:32,151 ----------------------------------------------------------------------------------------------------
205
+ 2024-07-30 07:04:32,151 EPOCH 10
206
+ 2024-07-30 07:04:46,320 batch 67/672 - loss 2.63715485 - lr 0.0001 - time 14.17s
207
+ 2024-07-30 07:05:01,748 batch 134/672 - loss 2.61552854 - lr 0.0001 - time 29.60s
208
+ 2024-07-30 07:05:16,247 batch 201/672 - loss 2.59384394 - lr 0.0001 - time 44.10s
209
+ 2024-07-30 07:05:29,207 batch 268/672 - loss 2.58763264 - lr 0.0001 - time 57.06s
210
+ 2024-07-30 07:05:42,315 batch 335/672 - loss 2.58304355 - lr 0.0001 - time 70.16s
211
+ 2024-07-30 07:05:56,208 batch 402/672 - loss 2.58799308 - lr 0.0001 - time 84.06s
212
+ 2024-07-30 07:06:11,341 batch 469/672 - loss 2.58249684 - lr 0.0001 - time 99.19s
213
+ 2024-07-30 07:06:26,235 batch 536/672 - loss 2.57772345 - lr 0.0001 - time 114.08s
214
+ 2024-07-30 07:06:39,611 batch 603/672 - loss 2.57349586 - lr 0.0001 - time 127.46s
215
+ 2024-07-30 07:06:53,096 batch 670/672 - loss 2.57109329 - lr 0.0001 - time 140.94s
216
+ 2024-07-30 07:06:53,521 ----------------------------------------------------------------------------------------------------
217
+ 2024-07-30 07:06:53,521 EPOCH 10 DONE
218
+ 2024-07-30 07:07:00,300 TRAIN Loss: 2.5705
219
+ 2024-07-30 07:07:00,301 DEV Loss: 5.7578
220
+ 2024-07-30 07:07:00,301 DEV Perplexity: 316.6407
221
+ 2024-07-30 07:07:00,301 No improvement for 5 epoch(s)
222
+ 2024-07-30 07:07:00,301 Patience reached: Terminating model training due to early stopping
223
+ 2024-07-30 07:07:00,301 ----------------------------------------------------------------------------------------------------
224
+ 2024-07-30 07:07:00,301 Finished Training
225
+ 2024-07-30 07:07:13,155 TEST Perplexity: 277.4748
226
+ 2024-07-30 07:14:01,422 TEST BLEU = 5.05 26.8/14.0/3.4/0.5 (BP = 1.000 ratio = 3.484 hyp_len = 777 ref_len = 223)
models/en2el/word_word2vec_embeddings_with_attention/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64fb9c0e44c596e6389213fb31eb4025d9d71903b16ae76ac8ec1744c0851e28
3
+ size 101049192
models/en2el/word_word2vec_embeddings_without_attention/log.txt ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-07-30 07:14:10,581 ----------------------------------------------------------------------------------------------------
2
+ 2024-07-30 07:14:10,581 Training Model
3
+ 2024-07-30 07:14:10,581 ----------------------------------------------------------------------------------------------------
4
+ 2024-07-30 07:14:10,581 Translator(
5
+ (encoder): EncoderLSTM(
6
+ (embedding): Embedding(13968, 300, padding_idx=13963)
7
+ (dropout): Dropout(p=0.1, inplace=False)
8
+ (lstm): LSTM(300, 512, batch_first=True, bidirectional=True)
9
+ )
10
+ (decoder): DecoderLSTM(
11
+ (embedding): Embedding(20803, 300, padding_idx=20798)
12
+ (dropout): Dropout(p=0.1, inplace=False)
13
+ (lstm): LSTM(300, 1024, batch_first=True)
14
+ (hidden2vocab): Linear(in_features=1024, out_features=20803, bias=True)
15
+ (log_softmax): LogSoftmax(dim=-1)
16
+ )
17
+ )
18
+ 2024-07-30 07:14:10,581 ----------------------------------------------------------------------------------------------------
19
+ 2024-07-30 07:14:10,581 Training Hyperparameters:
20
+ 2024-07-30 07:14:10,581 - max_epochs: 10
21
+ 2024-07-30 07:14:10,581 - learning_rate: 0.001
22
+ 2024-07-30 07:14:10,581 - batch_size: 128
23
+ 2024-07-30 07:14:10,581 - patience: 5
24
+ 2024-07-30 07:14:10,581 - scheduler_patience: 3
25
+ 2024-07-30 07:14:10,581 - teacher_forcing_ratio: 0.5
26
+ 2024-07-30 07:14:10,581 ----------------------------------------------------------------------------------------------------
27
+ 2024-07-30 07:14:10,581 Computational Parameters:
28
+ 2024-07-30 07:14:10,581 - num_workers: 4
29
+ 2024-07-30 07:14:10,581 - device: device(type='cuda', index=0)
30
+ 2024-07-30 07:14:10,582 ----------------------------------------------------------------------------------------------------
31
+ 2024-07-30 07:14:10,582 Dataset Splits:
32
+ 2024-07-30 07:14:10,582 - train: 85949 data points
33
+ 2024-07-30 07:14:10,582 - dev: 12279 data points
34
+ 2024-07-30 07:14:10,582 - test: 24557 data points
35
+ 2024-07-30 07:14:10,582 ----------------------------------------------------------------------------------------------------
36
+ 2024-07-30 07:14:10,582 EPOCH 1
37
+ 2024-07-30 07:14:25,603 batch 67/672 - loss 6.79073620 - lr 0.0010 - time 15.02s
38
+ 2024-07-30 07:14:41,017 batch 134/672 - loss 6.43879131 - lr 0.0010 - time 30.44s
39
+ 2024-07-30 07:14:57,339 batch 201/672 - loss 6.23417893 - lr 0.0010 - time 46.76s
40
+ 2024-07-30 07:15:13,686 batch 268/672 - loss 6.08148850 - lr 0.0010 - time 63.10s
41
+ 2024-07-30 07:15:28,795 batch 335/672 - loss 5.97226765 - lr 0.0010 - time 78.21s
42
+ 2024-07-30 07:15:43,691 batch 402/672 - loss 5.87810853 - lr 0.0010 - time 93.11s
43
+ 2024-07-30 07:16:00,445 batch 469/672 - loss 5.79837992 - lr 0.0010 - time 109.86s
44
+ 2024-07-30 07:16:16,168 batch 536/672 - loss 5.73229925 - lr 0.0010 - time 125.59s
45
+ 2024-07-30 07:16:31,288 batch 603/672 - loss 5.66768689 - lr 0.0010 - time 140.71s
46
+ 2024-07-30 07:16:47,155 batch 670/672 - loss 5.61437018 - lr 0.0010 - time 156.57s
47
+ 2024-07-30 07:16:47,648 ----------------------------------------------------------------------------------------------------
48
+ 2024-07-30 07:16:47,649 EPOCH 1 DONE
49
+ 2024-07-30 07:16:55,238 TRAIN Loss: 5.6123
50
+ 2024-07-30 07:16:55,238 DEV Loss: 6.0562
51
+ 2024-07-30 07:16:55,238 DEV Perplexity: 426.7618
52
+ 2024-07-30 07:16:55,238 New best score!
53
+ 2024-07-30 07:16:55,240 ----------------------------------------------------------------------------------------------------
54
+ 2024-07-30 07:16:55,240 EPOCH 2
55
+ 2024-07-30 07:17:11,669 batch 67/672 - loss 4.98554675 - lr 0.0010 - time 16.43s
56
+ 2024-07-30 07:17:27,224 batch 134/672 - loss 4.96879658 - lr 0.0010 - time 31.98s
57
+ 2024-07-30 07:17:42,297 batch 201/672 - loss 4.95167577 - lr 0.0010 - time 47.06s
58
+ 2024-07-30 07:17:58,306 batch 268/672 - loss 4.92793236 - lr 0.0010 - time 63.07s
59
+ 2024-07-30 07:18:14,497 batch 335/672 - loss 4.90692339 - lr 0.0010 - time 79.26s
60
+ 2024-07-30 07:18:29,476 batch 402/672 - loss 4.88686806 - lr 0.0010 - time 94.24s
61
+ 2024-07-30 07:18:45,523 batch 469/672 - loss 4.87248768 - lr 0.0010 - time 110.28s
62
+ 2024-07-30 07:19:01,902 batch 536/672 - loss 4.85536665 - lr 0.0010 - time 126.66s
63
+ 2024-07-30 07:19:17,307 batch 603/672 - loss 4.84154147 - lr 0.0010 - time 142.07s
64
+ 2024-07-30 07:19:32,338 batch 670/672 - loss 4.82394058 - lr 0.0010 - time 157.10s
65
+ 2024-07-30 07:19:32,900 ----------------------------------------------------------------------------------------------------
66
+ 2024-07-30 07:19:32,900 EPOCH 2 DONE
67
+ 2024-07-30 07:19:40,679 TRAIN Loss: 4.8233
68
+ 2024-07-30 07:19:40,679 DEV Loss: 5.7909
69
+ 2024-07-30 07:19:40,679 DEV Perplexity: 327.3106
70
+ 2024-07-30 07:19:40,679 New best score!
71
+ 2024-07-30 07:19:40,680 ----------------------------------------------------------------------------------------------------
72
+ 2024-07-30 07:19:40,680 EPOCH 3
73
+ 2024-07-30 07:19:56,748 batch 67/672 - loss 4.52631765 - lr 0.0010 - time 16.07s
74
+ 2024-07-30 07:20:12,034 batch 134/672 - loss 4.53056113 - lr 0.0010 - time 31.35s
75
+ 2024-07-30 07:20:27,505 batch 201/672 - loss 4.52525295 - lr 0.0010 - time 46.83s
76
+ 2024-07-30 07:20:43,099 batch 268/672 - loss 4.52333133 - lr 0.0010 - time 62.42s
77
+ 2024-07-30 07:20:58,649 batch 335/672 - loss 4.51516634 - lr 0.0010 - time 77.97s
78
+ 2024-07-30 07:21:14,567 batch 402/672 - loss 4.50991095 - lr 0.0010 - time 93.89s
79
+ 2024-07-30 07:21:30,492 batch 469/672 - loss 4.50259467 - lr 0.0010 - time 109.81s
80
+ 2024-07-30 07:21:45,460 batch 536/672 - loss 4.49745491 - lr 0.0010 - time 124.78s
81
+ 2024-07-30 07:22:01,175 batch 603/672 - loss 4.49024783 - lr 0.0010 - time 140.49s
82
+ 2024-07-30 07:22:17,167 batch 670/672 - loss 4.48228455 - lr 0.0010 - time 156.49s
83
+ 2024-07-30 07:22:17,640 ----------------------------------------------------------------------------------------------------
84
+ 2024-07-30 07:22:17,641 EPOCH 3 DONE
85
+ 2024-07-30 07:22:25,117 TRAIN Loss: 4.4828
86
+ 2024-07-30 07:22:25,117 DEV Loss: 5.6332
87
+ 2024-07-30 07:22:25,117 DEV Perplexity: 279.5651
88
+ 2024-07-30 07:22:25,117 New best score!
89
+ 2024-07-30 07:22:25,119 ----------------------------------------------------------------------------------------------------
90
+ 2024-07-30 07:22:25,119 EPOCH 4
91
+ 2024-07-30 07:22:40,730 batch 67/672 - loss 4.24072004 - lr 0.0010 - time 15.61s
92
+ 2024-07-30 07:22:56,145 batch 134/672 - loss 4.23735875 - lr 0.0010 - time 31.03s
93
+ 2024-07-30 07:23:11,635 batch 201/672 - loss 4.23973921 - lr 0.0010 - time 46.52s
94
+ 2024-07-30 07:23:26,695 batch 268/672 - loss 4.23747351 - lr 0.0010 - time 61.58s
95
+ 2024-07-30 07:23:42,360 batch 335/672 - loss 4.24118206 - lr 0.0010 - time 77.24s
96
+ 2024-07-30 07:23:57,099 batch 402/672 - loss 4.23950315 - lr 0.0010 - time 91.98s
97
+ 2024-07-30 07:24:13,509 batch 469/672 - loss 4.24106605 - lr 0.0010 - time 108.39s
98
+ 2024-07-30 07:24:29,084 batch 536/672 - loss 4.23772152 - lr 0.0010 - time 123.96s
99
+ 2024-07-30 07:24:45,404 batch 603/672 - loss 4.24087652 - lr 0.0010 - time 140.28s
100
+ 2024-07-30 07:25:00,880 batch 670/672 - loss 4.23783506 - lr 0.0010 - time 155.76s
101
+ 2024-07-30 07:25:01,389 ----------------------------------------------------------------------------------------------------
102
+ 2024-07-30 07:25:01,389 EPOCH 4 DONE
103
+ 2024-07-30 07:25:08,934 TRAIN Loss: 4.2383
104
+ 2024-07-30 07:25:08,935 DEV Loss: 5.5376
105
+ 2024-07-30 07:25:08,935 DEV Perplexity: 254.0605
106
+ 2024-07-30 07:25:08,935 New best score!
107
+ 2024-07-30 07:25:08,936 ----------------------------------------------------------------------------------------------------
108
+ 2024-07-30 07:25:08,936 EPOCH 5
109
+ 2024-07-30 07:25:24,139 batch 67/672 - loss 4.01658248 - lr 0.0010 - time 15.20s
110
+ 2024-07-30 07:25:41,531 batch 134/672 - loss 4.00365573 - lr 0.0010 - time 32.59s
111
+ 2024-07-30 07:25:57,109 batch 201/672 - loss 4.02020892 - lr 0.0010 - time 48.17s
112
+ 2024-07-30 07:26:12,417 batch 268/672 - loss 4.02071435 - lr 0.0010 - time 63.48s
113
+ 2024-07-30 07:26:27,886 batch 335/672 - loss 4.02673044 - lr 0.0010 - time 78.95s
114
+ 2024-07-30 07:26:43,953 batch 402/672 - loss 4.03282895 - lr 0.0010 - time 95.02s
115
+ 2024-07-30 07:26:58,885 batch 469/672 - loss 4.03949336 - lr 0.0010 - time 109.95s
116
+ 2024-07-30 07:27:13,639 batch 536/672 - loss 4.04385120 - lr 0.0010 - time 124.70s
117
+ 2024-07-30 07:27:29,045 batch 603/672 - loss 4.04219495 - lr 0.0010 - time 140.11s
118
+ 2024-07-30 07:27:45,163 batch 670/672 - loss 4.04155590 - lr 0.0010 - time 156.23s
119
+ 2024-07-30 07:27:45,688 ----------------------------------------------------------------------------------------------------
120
+ 2024-07-30 07:27:45,688 EPOCH 5 DONE
121
+ 2024-07-30 07:27:53,366 TRAIN Loss: 4.0414
122
+ 2024-07-30 07:27:53,367 DEV Loss: 5.5077
123
+ 2024-07-30 07:27:53,367 DEV Perplexity: 246.5857
124
+ 2024-07-30 07:27:53,367 New best score!
125
+ 2024-07-30 07:27:53,368 ----------------------------------------------------------------------------------------------------
126
+ 2024-07-30 07:27:53,368 EPOCH 6
127
+ 2024-07-30 07:28:08,954 batch 67/672 - loss 3.81063235 - lr 0.0010 - time 15.59s
128
+ 2024-07-30 07:28:25,891 batch 134/672 - loss 3.84132829 - lr 0.0010 - time 32.52s
129
+ 2024-07-30 07:28:41,893 batch 201/672 - loss 3.86148431 - lr 0.0010 - time 48.52s
130
+ 2024-07-30 07:28:57,642 batch 268/672 - loss 3.86686947 - lr 0.0010 - time 64.27s
131
+ 2024-07-30 07:29:12,267 batch 335/672 - loss 3.86267434 - lr 0.0010 - time 78.90s
132
+ 2024-07-30 07:29:28,723 batch 402/672 - loss 3.87000411 - lr 0.0010 - time 95.35s
133
+ 2024-07-30 07:29:44,486 batch 469/672 - loss 3.88109252 - lr 0.0010 - time 111.12s
134
+ 2024-07-30 07:29:59,859 batch 536/672 - loss 3.88538233 - lr 0.0010 - time 126.49s
135
+ 2024-07-30 07:30:15,655 batch 603/672 - loss 3.89046454 - lr 0.0010 - time 142.29s
136
+ 2024-07-30 07:30:31,567 batch 670/672 - loss 3.89366428 - lr 0.0010 - time 158.20s
137
+ 2024-07-30 07:30:32,032 ----------------------------------------------------------------------------------------------------
138
+ 2024-07-30 07:30:32,032 EPOCH 6 DONE
139
+ 2024-07-30 07:30:39,563 TRAIN Loss: 3.8940
140
+ 2024-07-30 07:30:39,563 DEV Loss: 5.5762
141
+ 2024-07-30 07:30:39,564 DEV Perplexity: 264.0631
142
+ 2024-07-30 07:30:39,564 No improvement for 1 epoch(s)
143
+ 2024-07-30 07:30:39,564 ----------------------------------------------------------------------------------------------------
144
+ 2024-07-30 07:30:39,564 EPOCH 7
145
+ 2024-07-30 07:30:55,671 batch 67/672 - loss 3.68092326 - lr 0.0010 - time 16.11s
146
+ 2024-07-30 07:31:11,517 batch 134/672 - loss 3.68208616 - lr 0.0010 - time 31.95s
147
+ 2024-07-30 07:31:26,404 batch 201/672 - loss 3.69134398 - lr 0.0010 - time 46.84s
148
+ 2024-07-30 07:31:41,928 batch 268/672 - loss 3.70132633 - lr 0.0010 - time 62.36s
149
+ 2024-07-30 07:31:58,864 batch 335/672 - loss 3.70940298 - lr 0.0010 - time 79.30s
150
+ 2024-07-30 07:32:14,648 batch 402/672 - loss 3.71866960 - lr 0.0010 - time 95.08s
151
+ 2024-07-30 07:32:30,527 batch 469/672 - loss 3.72513837 - lr 0.0010 - time 110.96s
152
+ 2024-07-30 07:32:45,411 batch 536/672 - loss 3.73284457 - lr 0.0010 - time 125.85s
153
+ 2024-07-30 07:33:01,048 batch 603/672 - loss 3.73751154 - lr 0.0010 - time 141.48s
154
+ 2024-07-30 07:33:16,229 batch 670/672 - loss 3.74245008 - lr 0.0010 - time 156.67s
155
+ 2024-07-30 07:33:16,726 ----------------------------------------------------------------------------------------------------
156
+ 2024-07-30 07:33:16,727 EPOCH 7 DONE
157
+ 2024-07-30 07:33:24,325 TRAIN Loss: 3.7419
158
+ 2024-07-30 07:33:24,326 DEV Loss: 5.6171
159
+ 2024-07-30 07:33:24,326 DEV Perplexity: 275.0885
160
+ 2024-07-30 07:33:24,326 No improvement for 2 epoch(s)
161
+ 2024-07-30 07:33:24,326 ----------------------------------------------------------------------------------------------------
162
+ 2024-07-30 07:33:24,326 EPOCH 8
163
+ 2024-07-30 07:33:39,813 batch 67/672 - loss 3.55043242 - lr 0.0010 - time 15.49s
164
+ 2024-07-30 07:33:56,610 batch 134/672 - loss 3.57412469 - lr 0.0010 - time 32.28s
165
+ 2024-07-30 07:34:12,278 batch 201/672 - loss 3.56322409 - lr 0.0010 - time 47.95s
166
+ 2024-07-30 07:34:28,544 batch 268/672 - loss 3.57510089 - lr 0.0010 - time 64.22s
167
+ 2024-07-30 07:34:43,774 batch 335/672 - loss 3.58766795 - lr 0.0010 - time 79.45s
168
+ 2024-07-30 07:34:59,154 batch 402/672 - loss 3.59306457 - lr 0.0010 - time 94.83s
169
+ 2024-07-30 07:35:14,306 batch 469/672 - loss 3.60238248 - lr 0.0010 - time 109.98s
170
+ 2024-07-30 07:35:29,591 batch 536/672 - loss 3.60817340 - lr 0.0010 - time 125.27s
171
+ 2024-07-30 07:35:44,801 batch 603/672 - loss 3.61315201 - lr 0.0010 - time 140.48s
172
+ 2024-07-30 07:36:00,948 batch 670/672 - loss 3.62142709 - lr 0.0010 - time 156.62s
173
+ 2024-07-30 07:36:01,426 ----------------------------------------------------------------------------------------------------
174
+ 2024-07-30 07:36:01,426 EPOCH 8 DONE
175
+ 2024-07-30 07:36:09,156 TRAIN Loss: 3.6211
176
+ 2024-07-30 07:36:09,157 DEV Loss: 5.6400
177
+ 2024-07-30 07:36:09,157 DEV Perplexity: 281.4667
178
+ 2024-07-30 07:36:09,157 No improvement for 3 epoch(s)
179
+ 2024-07-30 07:36:09,157 ----------------------------------------------------------------------------------------------------
180
+ 2024-07-30 07:36:09,157 EPOCH 9
181
+ 2024-07-30 07:36:24,975 batch 67/672 - loss 3.42633642 - lr 0.0010 - time 15.82s
182
+ 2024-07-30 07:36:40,416 batch 134/672 - loss 3.43367955 - lr 0.0010 - time 31.26s
183
+ 2024-07-30 07:36:55,330 batch 201/672 - loss 3.46320351 - lr 0.0010 - time 46.17s
184
+ 2024-07-30 07:37:10,791 batch 268/672 - loss 3.47078220 - lr 0.0010 - time 61.63s
185
+ 2024-07-30 07:37:26,989 batch 335/672 - loss 3.48317769 - lr 0.0010 - time 77.83s
186
+ 2024-07-30 07:37:42,252 batch 402/672 - loss 3.48588328 - lr 0.0010 - time 93.10s
187
+ 2024-07-30 07:37:58,368 batch 469/672 - loss 3.49334856 - lr 0.0010 - time 109.21s
188
+ 2024-07-30 07:38:14,294 batch 536/672 - loss 3.50088324 - lr 0.0010 - time 125.14s
189
+ 2024-07-30 07:38:30,203 batch 603/672 - loss 3.50751225 - lr 0.0010 - time 141.05s
190
+ 2024-07-30 07:38:45,918 batch 670/672 - loss 3.51215252 - lr 0.0010 - time 156.76s
191
+ 2024-07-30 07:38:46,374 ----------------------------------------------------------------------------------------------------
192
+ 2024-07-30 07:38:46,374 EPOCH 9 DONE
193
+ 2024-07-30 07:38:53,965 TRAIN Loss: 3.5123
194
+ 2024-07-30 07:38:53,965 DEV Loss: 5.6476
195
+ 2024-07-30 07:38:53,965 DEV Perplexity: 283.6166
196
+ 2024-07-30 07:38:53,965 No improvement for 4 epoch(s)
197
+ 2024-07-30 07:38:53,965 ----------------------------------------------------------------------------------------------------
198
+ 2024-07-30 07:38:53,965 EPOCH 10
199
+ 2024-07-30 07:39:09,100 batch 67/672 - loss 3.30588978 - lr 0.0001 - time 15.13s
200
+ 2024-07-30 07:39:24,632 batch 134/672 - loss 3.30424578 - lr 0.0001 - time 30.67s
201
+ 2024-07-30 07:39:40,766 batch 201/672 - loss 3.29494134 - lr 0.0001 - time 46.80s
202
+ 2024-07-30 07:39:56,061 batch 268/672 - loss 3.28674206 - lr 0.0001 - time 62.10s
203
+ 2024-07-30 07:40:11,796 batch 335/672 - loss 3.28545445 - lr 0.0001 - time 77.83s
204
+ 2024-07-30 07:40:28,788 batch 402/672 - loss 3.28475093 - lr 0.0001 - time 94.82s
205
+ 2024-07-30 07:40:44,628 batch 469/672 - loss 3.28263446 - lr 0.0001 - time 110.66s
206
+ 2024-07-30 07:40:59,153 batch 536/672 - loss 3.27839362 - lr 0.0001 - time 125.19s
207
+ 2024-07-30 07:41:14,662 batch 603/672 - loss 3.27487768 - lr 0.0001 - time 140.70s
208
+ 2024-07-30 07:41:30,672 batch 670/672 - loss 3.27459330 - lr 0.0001 - time 156.71s
209
+ 2024-07-30 07:41:31,143 ----------------------------------------------------------------------------------------------------
210
+ 2024-07-30 07:41:31,144 EPOCH 10 DONE
211
+ 2024-07-30 07:41:38,732 TRAIN Loss: 3.2747
212
+ 2024-07-30 07:41:38,733 DEV Loss: 5.6622
213
+ 2024-07-30 07:41:38,733 DEV Perplexity: 287.7808
214
+ 2024-07-30 07:41:38,733 No improvement for 5 epoch(s)
215
+ 2024-07-30 07:41:38,733 Patience reached: Terminating model training due to early stopping
216
+ 2024-07-30 07:41:38,733 ----------------------------------------------------------------------------------------------------
217
+ 2024-07-30 07:41:38,733 Finished Training
218
+ 2024-07-30 07:41:53,047 TEST Perplexity: 247.5392
219
+ 2024-07-30 07:44:49,655 TEST BLEU = 24.44 72.8/46.2/18.9/5.6 (BP = 1.000 ratio = 1.000 hyp_len = 92 ref_len = 92)
models/en2el/word_word2vec_embeddings_without_attention/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31963994d3abcabe5c0f9d0dec8f23ce1972703bacc8781422b0cfc8e853edbb
3
+ size 163279412