bongseok commited on
Commit
9ce3e32
·
1 Parent(s): 28eb9fc

End of training

Browse files
Files changed (4) hide show
  1. all_results.json +8 -0
  2. train_results.json +8 -0
  3. train_results.txt +6 -0
  4. trainer_state.json +3359 -0
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.0,
3
+ "train_loss": 1.7010083939733822,
4
+ "train_runtime": 6451.2169,
5
+ "train_samples": 42367,
6
+ "train_samples_per_second": 32.836,
7
+ "train_steps_per_second": 8.209
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.0,
3
+ "train_loss": 1.7010083939733822,
4
+ "train_runtime": 6451.2169,
5
+ "train_samples": 42367,
6
+ "train_samples_per_second": 32.836,
7
+ "train_steps_per_second": 8.209
8
+ }
train_results.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ epoch = 5.0
2
+ train_loss = 1.7010083939733822
3
+ train_runtime = 6451.2169
4
+ train_samples = 42367
5
+ train_samples_per_second = 32.836
6
+ train_steps_per_second = 8.209
trainer_state.json ADDED
@@ -0,0 +1,3359 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 2.720076322555542,
3
+ "best_model_checkpoint": "./models/kobart_4_5.6e-5_datav2_min30_lp5.0_temperature1.0/checkpoint-20000",
4
+ "epoch": 5.0,
5
+ "global_step": 52960,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.01,
12
+ "learning_rate": 1.0574018126888216e-06,
13
+ "loss": 9.5934,
14
+ "step": 100
15
+ },
16
+ {
17
+ "epoch": 0.02,
18
+ "learning_rate": 2.1148036253776433e-06,
19
+ "loss": 4.9023,
20
+ "step": 200
21
+ },
22
+ {
23
+ "epoch": 0.03,
24
+ "learning_rate": 3.1722054380664653e-06,
25
+ "loss": 3.5742,
26
+ "step": 300
27
+ },
28
+ {
29
+ "epoch": 0.04,
30
+ "learning_rate": 4.2296072507552865e-06,
31
+ "loss": 3.2585,
32
+ "step": 400
33
+ },
34
+ {
35
+ "epoch": 0.05,
36
+ "learning_rate": 5.287009063444109e-06,
37
+ "loss": 3.044,
38
+ "step": 500
39
+ },
40
+ {
41
+ "epoch": 0.06,
42
+ "learning_rate": 6.344410876132931e-06,
43
+ "loss": 2.9663,
44
+ "step": 600
45
+ },
46
+ {
47
+ "epoch": 0.07,
48
+ "learning_rate": 7.401812688821753e-06,
49
+ "loss": 2.7921,
50
+ "step": 700
51
+ },
52
+ {
53
+ "epoch": 0.08,
54
+ "learning_rate": 8.459214501510573e-06,
55
+ "loss": 2.794,
56
+ "step": 800
57
+ },
58
+ {
59
+ "epoch": 0.08,
60
+ "learning_rate": 9.516616314199396e-06,
61
+ "loss": 2.6951,
62
+ "step": 900
63
+ },
64
+ {
65
+ "epoch": 0.09,
66
+ "learning_rate": 1.0574018126888217e-05,
67
+ "loss": 2.7445,
68
+ "step": 1000
69
+ },
70
+ {
71
+ "epoch": 0.1,
72
+ "learning_rate": 1.1631419939577038e-05,
73
+ "loss": 2.6744,
74
+ "step": 1100
75
+ },
76
+ {
77
+ "epoch": 0.11,
78
+ "learning_rate": 1.2688821752265861e-05,
79
+ "loss": 2.644,
80
+ "step": 1200
81
+ },
82
+ {
83
+ "epoch": 0.12,
84
+ "learning_rate": 1.3746223564954682e-05,
85
+ "loss": 2.5615,
86
+ "step": 1300
87
+ },
88
+ {
89
+ "epoch": 0.13,
90
+ "learning_rate": 1.4803625377643505e-05,
91
+ "loss": 2.6068,
92
+ "step": 1400
93
+ },
94
+ {
95
+ "epoch": 0.14,
96
+ "learning_rate": 1.5861027190332325e-05,
97
+ "loss": 2.5422,
98
+ "step": 1500
99
+ },
100
+ {
101
+ "epoch": 0.15,
102
+ "learning_rate": 1.6918429003021146e-05,
103
+ "loss": 2.5506,
104
+ "step": 1600
105
+ },
106
+ {
107
+ "epoch": 0.16,
108
+ "learning_rate": 1.797583081570997e-05,
109
+ "loss": 2.5316,
110
+ "step": 1700
111
+ },
112
+ {
113
+ "epoch": 0.17,
114
+ "learning_rate": 1.9033232628398792e-05,
115
+ "loss": 2.4983,
116
+ "step": 1800
117
+ },
118
+ {
119
+ "epoch": 0.18,
120
+ "learning_rate": 2.0090634441087613e-05,
121
+ "loss": 2.4222,
122
+ "step": 1900
123
+ },
124
+ {
125
+ "epoch": 0.19,
126
+ "learning_rate": 2.1148036253776434e-05,
127
+ "loss": 2.5043,
128
+ "step": 2000
129
+ },
130
+ {
131
+ "epoch": 0.2,
132
+ "learning_rate": 2.2205438066465256e-05,
133
+ "loss": 2.4956,
134
+ "step": 2100
135
+ },
136
+ {
137
+ "epoch": 0.21,
138
+ "learning_rate": 2.3262839879154077e-05,
139
+ "loss": 2.503,
140
+ "step": 2200
141
+ },
142
+ {
143
+ "epoch": 0.22,
144
+ "learning_rate": 2.43202416918429e-05,
145
+ "loss": 2.4681,
146
+ "step": 2300
147
+ },
148
+ {
149
+ "epoch": 0.23,
150
+ "learning_rate": 2.5377643504531723e-05,
151
+ "loss": 2.4417,
152
+ "step": 2400
153
+ },
154
+ {
155
+ "epoch": 0.24,
156
+ "learning_rate": 2.6435045317220544e-05,
157
+ "loss": 2.3993,
158
+ "step": 2500
159
+ },
160
+ {
161
+ "epoch": 0.25,
162
+ "learning_rate": 2.7492447129909365e-05,
163
+ "loss": 2.424,
164
+ "step": 2600
165
+ },
166
+ {
167
+ "epoch": 0.25,
168
+ "learning_rate": 2.8549848942598183e-05,
169
+ "loss": 2.4301,
170
+ "step": 2700
171
+ },
172
+ {
173
+ "epoch": 0.26,
174
+ "learning_rate": 2.960725075528701e-05,
175
+ "loss": 2.5465,
176
+ "step": 2800
177
+ },
178
+ {
179
+ "epoch": 0.27,
180
+ "learning_rate": 3.066465256797583e-05,
181
+ "loss": 2.398,
182
+ "step": 2900
183
+ },
184
+ {
185
+ "epoch": 0.28,
186
+ "learning_rate": 3.172205438066465e-05,
187
+ "loss": 2.4061,
188
+ "step": 3000
189
+ },
190
+ {
191
+ "epoch": 0.29,
192
+ "learning_rate": 3.277945619335347e-05,
193
+ "loss": 2.4014,
194
+ "step": 3100
195
+ },
196
+ {
197
+ "epoch": 0.3,
198
+ "learning_rate": 3.383685800604229e-05,
199
+ "loss": 2.4134,
200
+ "step": 3200
201
+ },
202
+ {
203
+ "epoch": 0.31,
204
+ "learning_rate": 3.4894259818731113e-05,
205
+ "loss": 2.4089,
206
+ "step": 3300
207
+ },
208
+ {
209
+ "epoch": 0.32,
210
+ "learning_rate": 3.595166163141994e-05,
211
+ "loss": 2.4397,
212
+ "step": 3400
213
+ },
214
+ {
215
+ "epoch": 0.33,
216
+ "learning_rate": 3.700906344410876e-05,
217
+ "loss": 2.3923,
218
+ "step": 3500
219
+ },
220
+ {
221
+ "epoch": 0.34,
222
+ "learning_rate": 3.8066465256797584e-05,
223
+ "loss": 2.3635,
224
+ "step": 3600
225
+ },
226
+ {
227
+ "epoch": 0.35,
228
+ "learning_rate": 3.9123867069486405e-05,
229
+ "loss": 2.3589,
230
+ "step": 3700
231
+ },
232
+ {
233
+ "epoch": 0.36,
234
+ "learning_rate": 4.0181268882175226e-05,
235
+ "loss": 2.3856,
236
+ "step": 3800
237
+ },
238
+ {
239
+ "epoch": 0.37,
240
+ "learning_rate": 4.123867069486405e-05,
241
+ "loss": 2.3656,
242
+ "step": 3900
243
+ },
244
+ {
245
+ "epoch": 0.38,
246
+ "learning_rate": 4.229607250755287e-05,
247
+ "loss": 2.402,
248
+ "step": 4000
249
+ },
250
+ {
251
+ "epoch": 0.39,
252
+ "learning_rate": 4.335347432024169e-05,
253
+ "loss": 2.4819,
254
+ "step": 4100
255
+ },
256
+ {
257
+ "epoch": 0.4,
258
+ "learning_rate": 4.441087613293051e-05,
259
+ "loss": 2.4714,
260
+ "step": 4200
261
+ },
262
+ {
263
+ "epoch": 0.41,
264
+ "learning_rate": 4.546827794561933e-05,
265
+ "loss": 2.3861,
266
+ "step": 4300
267
+ },
268
+ {
269
+ "epoch": 0.42,
270
+ "learning_rate": 4.6525679758308154e-05,
271
+ "loss": 2.4166,
272
+ "step": 4400
273
+ },
274
+ {
275
+ "epoch": 0.42,
276
+ "learning_rate": 4.7583081570996975e-05,
277
+ "loss": 2.3908,
278
+ "step": 4500
279
+ },
280
+ {
281
+ "epoch": 0.43,
282
+ "learning_rate": 4.86404833836858e-05,
283
+ "loss": 2.4096,
284
+ "step": 4600
285
+ },
286
+ {
287
+ "epoch": 0.44,
288
+ "learning_rate": 4.9697885196374624e-05,
289
+ "loss": 2.3846,
290
+ "step": 4700
291
+ },
292
+ {
293
+ "epoch": 0.45,
294
+ "learning_rate": 5.0755287009063445e-05,
295
+ "loss": 2.3477,
296
+ "step": 4800
297
+ },
298
+ {
299
+ "epoch": 0.46,
300
+ "learning_rate": 5.1812688821752266e-05,
301
+ "loss": 2.4371,
302
+ "step": 4900
303
+ },
304
+ {
305
+ "epoch": 0.47,
306
+ "learning_rate": 5.287009063444109e-05,
307
+ "loss": 2.3968,
308
+ "step": 5000
309
+ },
310
+ {
311
+ "epoch": 0.47,
312
+ "eval_bleu1": 27.0594,
313
+ "eval_bleu2": 15.1133,
314
+ "eval_bleu3": 8.4503,
315
+ "eval_bleu4": 4.564,
316
+ "eval_gen_len": 48.5501,
317
+ "eval_loss": 2.9096498489379883,
318
+ "eval_rouge1": 32.7469,
319
+ "eval_rouge2": 10.9679,
320
+ "eval_rougeL": 21.4954,
321
+ "eval_runtime": 125.8926,
322
+ "eval_samples_per_second": 3.408,
323
+ "eval_steps_per_second": 0.032,
324
+ "step": 5000
325
+ },
326
+ {
327
+ "epoch": 0.48,
328
+ "learning_rate": 5.392749244712991e-05,
329
+ "loss": 2.3874,
330
+ "step": 5100
331
+ },
332
+ {
333
+ "epoch": 0.49,
334
+ "learning_rate": 5.498489425981873e-05,
335
+ "loss": 2.3827,
336
+ "step": 5200
337
+ },
338
+ {
339
+ "epoch": 0.5,
340
+ "learning_rate": 5.599530043638805e-05,
341
+ "loss": 2.3515,
342
+ "step": 5300
343
+ },
344
+ {
345
+ "epoch": 0.51,
346
+ "learning_rate": 5.587781134608929e-05,
347
+ "loss": 2.4104,
348
+ "step": 5400
349
+ },
350
+ {
351
+ "epoch": 0.52,
352
+ "learning_rate": 5.5760322255790535e-05,
353
+ "loss": 2.423,
354
+ "step": 5500
355
+ },
356
+ {
357
+ "epoch": 0.53,
358
+ "learning_rate": 5.564283316549178e-05,
359
+ "loss": 2.4084,
360
+ "step": 5600
361
+ },
362
+ {
363
+ "epoch": 0.54,
364
+ "learning_rate": 5.5525344075193014e-05,
365
+ "loss": 2.3811,
366
+ "step": 5700
367
+ },
368
+ {
369
+ "epoch": 0.55,
370
+ "learning_rate": 5.5407854984894264e-05,
371
+ "loss": 2.4017,
372
+ "step": 5800
373
+ },
374
+ {
375
+ "epoch": 0.56,
376
+ "learning_rate": 5.52903658945955e-05,
377
+ "loss": 2.3276,
378
+ "step": 5900
379
+ },
380
+ {
381
+ "epoch": 0.57,
382
+ "learning_rate": 5.517287680429674e-05,
383
+ "loss": 2.4217,
384
+ "step": 6000
385
+ },
386
+ {
387
+ "epoch": 0.58,
388
+ "learning_rate": 5.5055387713997986e-05,
389
+ "loss": 2.3477,
390
+ "step": 6100
391
+ },
392
+ {
393
+ "epoch": 0.59,
394
+ "learning_rate": 5.493789862369923e-05,
395
+ "loss": 2.3782,
396
+ "step": 6200
397
+ },
398
+ {
399
+ "epoch": 0.59,
400
+ "learning_rate": 5.4820409533400465e-05,
401
+ "loss": 2.3795,
402
+ "step": 6300
403
+ },
404
+ {
405
+ "epoch": 0.6,
406
+ "learning_rate": 5.4702920443101714e-05,
407
+ "loss": 2.3955,
408
+ "step": 6400
409
+ },
410
+ {
411
+ "epoch": 0.61,
412
+ "learning_rate": 5.458543135280295e-05,
413
+ "loss": 2.3809,
414
+ "step": 6500
415
+ },
416
+ {
417
+ "epoch": 0.62,
418
+ "learning_rate": 5.446794226250419e-05,
419
+ "loss": 2.3147,
420
+ "step": 6600
421
+ },
422
+ {
423
+ "epoch": 0.63,
424
+ "learning_rate": 5.435045317220544e-05,
425
+ "loss": 2.4039,
426
+ "step": 6700
427
+ },
428
+ {
429
+ "epoch": 0.64,
430
+ "learning_rate": 5.423296408190668e-05,
431
+ "loss": 2.3404,
432
+ "step": 6800
433
+ },
434
+ {
435
+ "epoch": 0.65,
436
+ "learning_rate": 5.411547499160792e-05,
437
+ "loss": 2.2851,
438
+ "step": 6900
439
+ },
440
+ {
441
+ "epoch": 0.66,
442
+ "learning_rate": 5.3997985901309164e-05,
443
+ "loss": 2.3282,
444
+ "step": 7000
445
+ },
446
+ {
447
+ "epoch": 0.67,
448
+ "learning_rate": 5.388049681101041e-05,
449
+ "loss": 2.3535,
450
+ "step": 7100
451
+ },
452
+ {
453
+ "epoch": 0.68,
454
+ "learning_rate": 5.376300772071164e-05,
455
+ "loss": 2.359,
456
+ "step": 7200
457
+ },
458
+ {
459
+ "epoch": 0.69,
460
+ "learning_rate": 5.364551863041289e-05,
461
+ "loss": 2.3116,
462
+ "step": 7300
463
+ },
464
+ {
465
+ "epoch": 0.7,
466
+ "learning_rate": 5.352802954011413e-05,
467
+ "loss": 2.3784,
468
+ "step": 7400
469
+ },
470
+ {
471
+ "epoch": 0.71,
472
+ "learning_rate": 5.341054044981537e-05,
473
+ "loss": 2.3451,
474
+ "step": 7500
475
+ },
476
+ {
477
+ "epoch": 0.72,
478
+ "learning_rate": 5.3293051359516615e-05,
479
+ "loss": 2.3174,
480
+ "step": 7600
481
+ },
482
+ {
483
+ "epoch": 0.73,
484
+ "learning_rate": 5.317556226921786e-05,
485
+ "loss": 2.3198,
486
+ "step": 7700
487
+ },
488
+ {
489
+ "epoch": 0.74,
490
+ "learning_rate": 5.3058073178919094e-05,
491
+ "loss": 2.3014,
492
+ "step": 7800
493
+ },
494
+ {
495
+ "epoch": 0.75,
496
+ "learning_rate": 5.294058408862034e-05,
497
+ "loss": 2.3443,
498
+ "step": 7900
499
+ },
500
+ {
501
+ "epoch": 0.76,
502
+ "learning_rate": 5.2823094998321586e-05,
503
+ "loss": 2.3317,
504
+ "step": 8000
505
+ },
506
+ {
507
+ "epoch": 0.76,
508
+ "learning_rate": 5.270560590802282e-05,
509
+ "loss": 2.2974,
510
+ "step": 8100
511
+ },
512
+ {
513
+ "epoch": 0.77,
514
+ "learning_rate": 5.258811681772407e-05,
515
+ "loss": 2.3403,
516
+ "step": 8200
517
+ },
518
+ {
519
+ "epoch": 0.78,
520
+ "learning_rate": 5.247062772742531e-05,
521
+ "loss": 2.3128,
522
+ "step": 8300
523
+ },
524
+ {
525
+ "epoch": 0.79,
526
+ "learning_rate": 5.235313863712655e-05,
527
+ "loss": 2.3456,
528
+ "step": 8400
529
+ },
530
+ {
531
+ "epoch": 0.8,
532
+ "learning_rate": 5.2235649546827793e-05,
533
+ "loss": 2.3053,
534
+ "step": 8500
535
+ },
536
+ {
537
+ "epoch": 0.81,
538
+ "learning_rate": 5.2118160456529036e-05,
539
+ "loss": 2.3246,
540
+ "step": 8600
541
+ },
542
+ {
543
+ "epoch": 0.82,
544
+ "learning_rate": 5.200067136623028e-05,
545
+ "loss": 2.3298,
546
+ "step": 8700
547
+ },
548
+ {
549
+ "epoch": 0.83,
550
+ "learning_rate": 5.188318227593152e-05,
551
+ "loss": 2.3137,
552
+ "step": 8800
553
+ },
554
+ {
555
+ "epoch": 0.84,
556
+ "learning_rate": 5.176569318563276e-05,
557
+ "loss": 2.2685,
558
+ "step": 8900
559
+ },
560
+ {
561
+ "epoch": 0.85,
562
+ "learning_rate": 5.164820409533401e-05,
563
+ "loss": 2.3339,
564
+ "step": 9000
565
+ },
566
+ {
567
+ "epoch": 0.86,
568
+ "learning_rate": 5.1530715005035244e-05,
569
+ "loss": 2.2938,
570
+ "step": 9100
571
+ },
572
+ {
573
+ "epoch": 0.87,
574
+ "learning_rate": 5.1413225914736487e-05,
575
+ "loss": 2.2663,
576
+ "step": 9200
577
+ },
578
+ {
579
+ "epoch": 0.88,
580
+ "learning_rate": 5.1295736824437736e-05,
581
+ "loss": 2.275,
582
+ "step": 9300
583
+ },
584
+ {
585
+ "epoch": 0.89,
586
+ "learning_rate": 5.117824773413897e-05,
587
+ "loss": 2.286,
588
+ "step": 9400
589
+ },
590
+ {
591
+ "epoch": 0.9,
592
+ "learning_rate": 5.1060758643840215e-05,
593
+ "loss": 2.3234,
594
+ "step": 9500
595
+ },
596
+ {
597
+ "epoch": 0.91,
598
+ "learning_rate": 5.094326955354146e-05,
599
+ "loss": 2.3009,
600
+ "step": 9600
601
+ },
602
+ {
603
+ "epoch": 0.92,
604
+ "learning_rate": 5.08257804632427e-05,
605
+ "loss": 2.1689,
606
+ "step": 9700
607
+ },
608
+ {
609
+ "epoch": 0.93,
610
+ "learning_rate": 5.070829137294394e-05,
611
+ "loss": 2.2665,
612
+ "step": 9800
613
+ },
614
+ {
615
+ "epoch": 0.93,
616
+ "learning_rate": 5.0590802282645186e-05,
617
+ "loss": 2.2566,
618
+ "step": 9900
619
+ },
620
+ {
621
+ "epoch": 0.94,
622
+ "learning_rate": 5.047331319234642e-05,
623
+ "loss": 2.2338,
624
+ "step": 10000
625
+ },
626
+ {
627
+ "epoch": 0.94,
628
+ "eval_bleu1": 26.4886,
629
+ "eval_bleu2": 15.0125,
630
+ "eval_bleu3": 8.5792,
631
+ "eval_bleu4": 4.8523,
632
+ "eval_gen_len": 41.1049,
633
+ "eval_loss": 2.8001697063446045,
634
+ "eval_rouge1": 33.2148,
635
+ "eval_rouge2": 11.5121,
636
+ "eval_rougeL": 22.7066,
637
+ "eval_runtime": 101.3032,
638
+ "eval_samples_per_second": 4.235,
639
+ "eval_steps_per_second": 0.039,
640
+ "step": 10000
641
+ },
642
+ {
643
+ "epoch": 0.95,
644
+ "learning_rate": 5.0355824102047665e-05,
645
+ "loss": 2.2267,
646
+ "step": 10100
647
+ },
648
+ {
649
+ "epoch": 0.96,
650
+ "learning_rate": 5.023833501174891e-05,
651
+ "loss": 2.198,
652
+ "step": 10200
653
+ },
654
+ {
655
+ "epoch": 0.97,
656
+ "learning_rate": 5.012084592145015e-05,
657
+ "loss": 2.2451,
658
+ "step": 10300
659
+ },
660
+ {
661
+ "epoch": 0.98,
662
+ "learning_rate": 5.000335683115139e-05,
663
+ "loss": 2.2533,
664
+ "step": 10400
665
+ },
666
+ {
667
+ "epoch": 0.99,
668
+ "learning_rate": 4.988586774085264e-05,
669
+ "loss": 2.3167,
670
+ "step": 10500
671
+ },
672
+ {
673
+ "epoch": 1.0,
674
+ "learning_rate": 4.976837865055387e-05,
675
+ "loss": 2.2111,
676
+ "step": 10600
677
+ },
678
+ {
679
+ "epoch": 1.01,
680
+ "learning_rate": 4.9650889560255116e-05,
681
+ "loss": 2.0181,
682
+ "step": 10700
683
+ },
684
+ {
685
+ "epoch": 1.02,
686
+ "learning_rate": 4.9533400469956365e-05,
687
+ "loss": 1.9661,
688
+ "step": 10800
689
+ },
690
+ {
691
+ "epoch": 1.03,
692
+ "learning_rate": 4.94159113796576e-05,
693
+ "loss": 1.9449,
694
+ "step": 10900
695
+ },
696
+ {
697
+ "epoch": 1.04,
698
+ "learning_rate": 4.9298422289358844e-05,
699
+ "loss": 2.0047,
700
+ "step": 11000
701
+ },
702
+ {
703
+ "epoch": 1.05,
704
+ "learning_rate": 4.918093319906009e-05,
705
+ "loss": 1.9197,
706
+ "step": 11100
707
+ },
708
+ {
709
+ "epoch": 1.06,
710
+ "learning_rate": 4.906344410876133e-05,
711
+ "loss": 1.9867,
712
+ "step": 11200
713
+ },
714
+ {
715
+ "epoch": 1.07,
716
+ "learning_rate": 4.8945955018462566e-05,
717
+ "loss": 2.0066,
718
+ "step": 11300
719
+ },
720
+ {
721
+ "epoch": 1.08,
722
+ "learning_rate": 4.8828465928163816e-05,
723
+ "loss": 2.0198,
724
+ "step": 11400
725
+ },
726
+ {
727
+ "epoch": 1.09,
728
+ "learning_rate": 4.871097683786505e-05,
729
+ "loss": 1.9711,
730
+ "step": 11500
731
+ },
732
+ {
733
+ "epoch": 1.1,
734
+ "learning_rate": 4.8593487747566294e-05,
735
+ "loss": 1.9725,
736
+ "step": 11600
737
+ },
738
+ {
739
+ "epoch": 1.1,
740
+ "learning_rate": 4.847599865726754e-05,
741
+ "loss": 2.032,
742
+ "step": 11700
743
+ },
744
+ {
745
+ "epoch": 1.11,
746
+ "learning_rate": 4.835850956696878e-05,
747
+ "loss": 1.9697,
748
+ "step": 11800
749
+ },
750
+ {
751
+ "epoch": 1.12,
752
+ "learning_rate": 4.824102047667002e-05,
753
+ "loss": 1.9865,
754
+ "step": 11900
755
+ },
756
+ {
757
+ "epoch": 1.13,
758
+ "learning_rate": 4.8123531386371266e-05,
759
+ "loss": 2.0073,
760
+ "step": 12000
761
+ },
762
+ {
763
+ "epoch": 1.14,
764
+ "learning_rate": 4.800604229607251e-05,
765
+ "loss": 1.9924,
766
+ "step": 12100
767
+ },
768
+ {
769
+ "epoch": 1.15,
770
+ "learning_rate": 4.788855320577375e-05,
771
+ "loss": 1.9583,
772
+ "step": 12200
773
+ },
774
+ {
775
+ "epoch": 1.16,
776
+ "learning_rate": 4.7771064115474994e-05,
777
+ "loss": 1.9727,
778
+ "step": 12300
779
+ },
780
+ {
781
+ "epoch": 1.17,
782
+ "learning_rate": 4.765357502517623e-05,
783
+ "loss": 1.9346,
784
+ "step": 12400
785
+ },
786
+ {
787
+ "epoch": 1.18,
788
+ "learning_rate": 4.753608593487748e-05,
789
+ "loss": 1.9879,
790
+ "step": 12500
791
+ },
792
+ {
793
+ "epoch": 1.19,
794
+ "learning_rate": 4.7418596844578716e-05,
795
+ "loss": 2.0059,
796
+ "step": 12600
797
+ },
798
+ {
799
+ "epoch": 1.2,
800
+ "learning_rate": 4.730110775427996e-05,
801
+ "loss": 1.9787,
802
+ "step": 12700
803
+ },
804
+ {
805
+ "epoch": 1.21,
806
+ "learning_rate": 4.71836186639812e-05,
807
+ "loss": 2.0469,
808
+ "step": 12800
809
+ },
810
+ {
811
+ "epoch": 1.22,
812
+ "learning_rate": 4.7066129573682445e-05,
813
+ "loss": 1.9941,
814
+ "step": 12900
815
+ },
816
+ {
817
+ "epoch": 1.23,
818
+ "learning_rate": 4.694864048338368e-05,
819
+ "loss": 1.9872,
820
+ "step": 13000
821
+ },
822
+ {
823
+ "epoch": 1.24,
824
+ "learning_rate": 4.683115139308493e-05,
825
+ "loss": 1.9744,
826
+ "step": 13100
827
+ },
828
+ {
829
+ "epoch": 1.25,
830
+ "learning_rate": 4.6713662302786166e-05,
831
+ "loss": 1.924,
832
+ "step": 13200
833
+ },
834
+ {
835
+ "epoch": 1.26,
836
+ "learning_rate": 4.659617321248741e-05,
837
+ "loss": 1.9875,
838
+ "step": 13300
839
+ },
840
+ {
841
+ "epoch": 1.27,
842
+ "learning_rate": 4.647868412218866e-05,
843
+ "loss": 1.9952,
844
+ "step": 13400
845
+ },
846
+ {
847
+ "epoch": 1.27,
848
+ "learning_rate": 4.6361195031889895e-05,
849
+ "loss": 2.0532,
850
+ "step": 13500
851
+ },
852
+ {
853
+ "epoch": 1.28,
854
+ "learning_rate": 4.624370594159114e-05,
855
+ "loss": 2.0257,
856
+ "step": 13600
857
+ },
858
+ {
859
+ "epoch": 1.29,
860
+ "learning_rate": 4.612621685129238e-05,
861
+ "loss": 1.9944,
862
+ "step": 13700
863
+ },
864
+ {
865
+ "epoch": 1.3,
866
+ "learning_rate": 4.6008727760993623e-05,
867
+ "loss": 1.9851,
868
+ "step": 13800
869
+ },
870
+ {
871
+ "epoch": 1.31,
872
+ "learning_rate": 4.589123867069486e-05,
873
+ "loss": 1.9465,
874
+ "step": 13900
875
+ },
876
+ {
877
+ "epoch": 1.32,
878
+ "learning_rate": 4.577374958039611e-05,
879
+ "loss": 1.9918,
880
+ "step": 14000
881
+ },
882
+ {
883
+ "epoch": 1.33,
884
+ "learning_rate": 4.5656260490097345e-05,
885
+ "loss": 1.9753,
886
+ "step": 14100
887
+ },
888
+ {
889
+ "epoch": 1.34,
890
+ "learning_rate": 4.553877139979859e-05,
891
+ "loss": 2.0087,
892
+ "step": 14200
893
+ },
894
+ {
895
+ "epoch": 1.35,
896
+ "learning_rate": 4.542128230949983e-05,
897
+ "loss": 2.0146,
898
+ "step": 14300
899
+ },
900
+ {
901
+ "epoch": 1.36,
902
+ "learning_rate": 4.5303793219201074e-05,
903
+ "loss": 1.9441,
904
+ "step": 14400
905
+ },
906
+ {
907
+ "epoch": 1.37,
908
+ "learning_rate": 4.518630412890231e-05,
909
+ "loss": 1.9914,
910
+ "step": 14500
911
+ },
912
+ {
913
+ "epoch": 1.38,
914
+ "learning_rate": 4.506881503860356e-05,
915
+ "loss": 2.0245,
916
+ "step": 14600
917
+ },
918
+ {
919
+ "epoch": 1.39,
920
+ "learning_rate": 4.49513259483048e-05,
921
+ "loss": 1.9698,
922
+ "step": 14700
923
+ },
924
+ {
925
+ "epoch": 1.4,
926
+ "learning_rate": 4.483383685800604e-05,
927
+ "loss": 1.9747,
928
+ "step": 14800
929
+ },
930
+ {
931
+ "epoch": 1.41,
932
+ "learning_rate": 4.471634776770729e-05,
933
+ "loss": 2.0027,
934
+ "step": 14900
935
+ },
936
+ {
937
+ "epoch": 1.42,
938
+ "learning_rate": 4.4598858677408524e-05,
939
+ "loss": 1.9652,
940
+ "step": 15000
941
+ },
942
+ {
943
+ "epoch": 1.42,
944
+ "eval_bleu1": 28.2628,
945
+ "eval_bleu2": 16.0909,
946
+ "eval_bleu3": 9.0427,
947
+ "eval_bleu4": 4.9254,
948
+ "eval_gen_len": 46.9744,
949
+ "eval_loss": 2.7699384689331055,
950
+ "eval_rouge1": 34.4269,
951
+ "eval_rouge2": 11.8551,
952
+ "eval_rougeL": 22.8478,
953
+ "eval_runtime": 110.644,
954
+ "eval_samples_per_second": 3.877,
955
+ "eval_steps_per_second": 0.036,
956
+ "step": 15000
957
+ },
958
+ {
959
+ "epoch": 1.43,
960
+ "learning_rate": 4.448136958710977e-05,
961
+ "loss": 2.015,
962
+ "step": 15100
963
+ },
964
+ {
965
+ "epoch": 1.44,
966
+ "learning_rate": 4.436388049681101e-05,
967
+ "loss": 1.9913,
968
+ "step": 15200
969
+ },
970
+ {
971
+ "epoch": 1.44,
972
+ "learning_rate": 4.424639140651225e-05,
973
+ "loss": 1.9891,
974
+ "step": 15300
975
+ },
976
+ {
977
+ "epoch": 1.45,
978
+ "learning_rate": 4.4128902316213495e-05,
979
+ "loss": 2.0327,
980
+ "step": 15400
981
+ },
982
+ {
983
+ "epoch": 1.46,
984
+ "learning_rate": 4.401141322591474e-05,
985
+ "loss": 1.9824,
986
+ "step": 15500
987
+ },
988
+ {
989
+ "epoch": 1.47,
990
+ "learning_rate": 4.3893924135615974e-05,
991
+ "loss": 2.0338,
992
+ "step": 15600
993
+ },
994
+ {
995
+ "epoch": 1.48,
996
+ "learning_rate": 4.3776435045317224e-05,
997
+ "loss": 2.015,
998
+ "step": 15700
999
+ },
1000
+ {
1001
+ "epoch": 1.49,
1002
+ "learning_rate": 4.365894595501846e-05,
1003
+ "loss": 2.0183,
1004
+ "step": 15800
1005
+ },
1006
+ {
1007
+ "epoch": 1.5,
1008
+ "learning_rate": 4.35414568647197e-05,
1009
+ "loss": 1.9903,
1010
+ "step": 15900
1011
+ },
1012
+ {
1013
+ "epoch": 1.51,
1014
+ "learning_rate": 4.3423967774420946e-05,
1015
+ "loss": 2.0171,
1016
+ "step": 16000
1017
+ },
1018
+ {
1019
+ "epoch": 1.52,
1020
+ "learning_rate": 4.330647868412219e-05,
1021
+ "loss": 2.0266,
1022
+ "step": 16100
1023
+ },
1024
+ {
1025
+ "epoch": 1.53,
1026
+ "learning_rate": 4.318898959382343e-05,
1027
+ "loss": 1.9222,
1028
+ "step": 16200
1029
+ },
1030
+ {
1031
+ "epoch": 1.54,
1032
+ "learning_rate": 4.3071500503524674e-05,
1033
+ "loss": 1.9535,
1034
+ "step": 16300
1035
+ },
1036
+ {
1037
+ "epoch": 1.55,
1038
+ "learning_rate": 4.295401141322592e-05,
1039
+ "loss": 2.009,
1040
+ "step": 16400
1041
+ },
1042
+ {
1043
+ "epoch": 1.56,
1044
+ "learning_rate": 4.283652232292715e-05,
1045
+ "loss": 1.9915,
1046
+ "step": 16500
1047
+ },
1048
+ {
1049
+ "epoch": 1.57,
1050
+ "learning_rate": 4.27190332326284e-05,
1051
+ "loss": 1.9631,
1052
+ "step": 16600
1053
+ },
1054
+ {
1055
+ "epoch": 1.58,
1056
+ "learning_rate": 4.260154414232964e-05,
1057
+ "loss": 2.0534,
1058
+ "step": 16700
1059
+ },
1060
+ {
1061
+ "epoch": 1.59,
1062
+ "learning_rate": 4.248405505203088e-05,
1063
+ "loss": 1.9991,
1064
+ "step": 16800
1065
+ },
1066
+ {
1067
+ "epoch": 1.6,
1068
+ "learning_rate": 4.2366565961732124e-05,
1069
+ "loss": 1.9622,
1070
+ "step": 16900
1071
+ },
1072
+ {
1073
+ "epoch": 1.6,
1074
+ "learning_rate": 4.224907687143337e-05,
1075
+ "loss": 2.0058,
1076
+ "step": 17000
1077
+ },
1078
+ {
1079
+ "epoch": 1.61,
1080
+ "learning_rate": 4.21315877811346e-05,
1081
+ "loss": 1.9901,
1082
+ "step": 17100
1083
+ },
1084
+ {
1085
+ "epoch": 1.62,
1086
+ "learning_rate": 4.201409869083585e-05,
1087
+ "loss": 1.9951,
1088
+ "step": 17200
1089
+ },
1090
+ {
1091
+ "epoch": 1.63,
1092
+ "learning_rate": 4.189660960053709e-05,
1093
+ "loss": 1.9505,
1094
+ "step": 17300
1095
+ },
1096
+ {
1097
+ "epoch": 1.64,
1098
+ "learning_rate": 4.177912051023833e-05,
1099
+ "loss": 1.9171,
1100
+ "step": 17400
1101
+ },
1102
+ {
1103
+ "epoch": 1.65,
1104
+ "learning_rate": 4.166163141993958e-05,
1105
+ "loss": 2.0443,
1106
+ "step": 17500
1107
+ },
1108
+ {
1109
+ "epoch": 1.66,
1110
+ "learning_rate": 4.154414232964082e-05,
1111
+ "loss": 2.0238,
1112
+ "step": 17600
1113
+ },
1114
+ {
1115
+ "epoch": 1.67,
1116
+ "learning_rate": 4.142665323934206e-05,
1117
+ "loss": 1.9011,
1118
+ "step": 17700
1119
+ },
1120
+ {
1121
+ "epoch": 1.68,
1122
+ "learning_rate": 4.13091641490433e-05,
1123
+ "loss": 2.0077,
1124
+ "step": 17800
1125
+ },
1126
+ {
1127
+ "epoch": 1.69,
1128
+ "learning_rate": 4.1191675058744546e-05,
1129
+ "loss": 1.9958,
1130
+ "step": 17900
1131
+ },
1132
+ {
1133
+ "epoch": 1.7,
1134
+ "learning_rate": 4.107418596844578e-05,
1135
+ "loss": 1.9903,
1136
+ "step": 18000
1137
+ },
1138
+ {
1139
+ "epoch": 1.71,
1140
+ "learning_rate": 4.095669687814703e-05,
1141
+ "loss": 1.9198,
1142
+ "step": 18100
1143
+ },
1144
+ {
1145
+ "epoch": 1.72,
1146
+ "learning_rate": 4.083920778784827e-05,
1147
+ "loss": 1.9204,
1148
+ "step": 18200
1149
+ },
1150
+ {
1151
+ "epoch": 1.73,
1152
+ "learning_rate": 4.072171869754951e-05,
1153
+ "loss": 1.955,
1154
+ "step": 18300
1155
+ },
1156
+ {
1157
+ "epoch": 1.74,
1158
+ "learning_rate": 4.0604229607250753e-05,
1159
+ "loss": 2.0038,
1160
+ "step": 18400
1161
+ },
1162
+ {
1163
+ "epoch": 1.75,
1164
+ "learning_rate": 4.0486740516951996e-05,
1165
+ "loss": 2.0018,
1166
+ "step": 18500
1167
+ },
1168
+ {
1169
+ "epoch": 1.76,
1170
+ "learning_rate": 4.036925142665324e-05,
1171
+ "loss": 2.0406,
1172
+ "step": 18600
1173
+ },
1174
+ {
1175
+ "epoch": 1.77,
1176
+ "learning_rate": 4.025176233635448e-05,
1177
+ "loss": 1.9958,
1178
+ "step": 18700
1179
+ },
1180
+ {
1181
+ "epoch": 1.77,
1182
+ "learning_rate": 4.0134273246055725e-05,
1183
+ "loss": 1.951,
1184
+ "step": 18800
1185
+ },
1186
+ {
1187
+ "epoch": 1.78,
1188
+ "learning_rate": 4.001678415575697e-05,
1189
+ "loss": 1.9721,
1190
+ "step": 18900
1191
+ },
1192
+ {
1193
+ "epoch": 1.79,
1194
+ "learning_rate": 3.989929506545821e-05,
1195
+ "loss": 1.9323,
1196
+ "step": 19000
1197
+ },
1198
+ {
1199
+ "epoch": 1.8,
1200
+ "learning_rate": 3.9781805975159447e-05,
1201
+ "loss": 1.9773,
1202
+ "step": 19100
1203
+ },
1204
+ {
1205
+ "epoch": 1.81,
1206
+ "learning_rate": 3.9664316884860696e-05,
1207
+ "loss": 2.0322,
1208
+ "step": 19200
1209
+ },
1210
+ {
1211
+ "epoch": 1.82,
1212
+ "learning_rate": 3.954682779456193e-05,
1213
+ "loss": 1.9171,
1214
+ "step": 19300
1215
+ },
1216
+ {
1217
+ "epoch": 1.83,
1218
+ "learning_rate": 3.9429338704263175e-05,
1219
+ "loss": 1.9805,
1220
+ "step": 19400
1221
+ },
1222
+ {
1223
+ "epoch": 1.84,
1224
+ "learning_rate": 3.931184961396442e-05,
1225
+ "loss": 1.9344,
1226
+ "step": 19500
1227
+ },
1228
+ {
1229
+ "epoch": 1.85,
1230
+ "learning_rate": 3.919436052366566e-05,
1231
+ "loss": 1.9163,
1232
+ "step": 19600
1233
+ },
1234
+ {
1235
+ "epoch": 1.86,
1236
+ "learning_rate": 3.90768714333669e-05,
1237
+ "loss": 1.9857,
1238
+ "step": 19700
1239
+ },
1240
+ {
1241
+ "epoch": 1.87,
1242
+ "learning_rate": 3.8959382343068146e-05,
1243
+ "loss": 1.9088,
1244
+ "step": 19800
1245
+ },
1246
+ {
1247
+ "epoch": 1.88,
1248
+ "learning_rate": 3.884189325276938e-05,
1249
+ "loss": 1.9217,
1250
+ "step": 19900
1251
+ },
1252
+ {
1253
+ "epoch": 1.89,
1254
+ "learning_rate": 3.8724404162470625e-05,
1255
+ "loss": 2.001,
1256
+ "step": 20000
1257
+ },
1258
+ {
1259
+ "epoch": 1.89,
1260
+ "eval_bleu1": 28.3593,
1261
+ "eval_bleu2": 16.1361,
1262
+ "eval_bleu3": 9.221,
1263
+ "eval_bleu4": 4.8616,
1264
+ "eval_gen_len": 46.979,
1265
+ "eval_loss": 2.720076322555542,
1266
+ "eval_rouge1": 34.157,
1267
+ "eval_rouge2": 11.8683,
1268
+ "eval_rougeL": 22.6775,
1269
+ "eval_runtime": 113.1379,
1270
+ "eval_samples_per_second": 3.792,
1271
+ "eval_steps_per_second": 0.035,
1272
+ "step": 20000
1273
+ },
1274
+ {
1275
+ "epoch": 1.9,
1276
+ "learning_rate": 3.8606915072171875e-05,
1277
+ "loss": 1.9878,
1278
+ "step": 20100
1279
+ },
1280
+ {
1281
+ "epoch": 1.91,
1282
+ "learning_rate": 3.848942598187311e-05,
1283
+ "loss": 1.9526,
1284
+ "step": 20200
1285
+ },
1286
+ {
1287
+ "epoch": 1.92,
1288
+ "learning_rate": 3.8371936891574354e-05,
1289
+ "loss": 1.9721,
1290
+ "step": 20300
1291
+ },
1292
+ {
1293
+ "epoch": 1.93,
1294
+ "learning_rate": 3.82544478012756e-05,
1295
+ "loss": 2.0391,
1296
+ "step": 20400
1297
+ },
1298
+ {
1299
+ "epoch": 1.94,
1300
+ "learning_rate": 3.813695871097684e-05,
1301
+ "loss": 1.9484,
1302
+ "step": 20500
1303
+ },
1304
+ {
1305
+ "epoch": 1.94,
1306
+ "learning_rate": 3.8019469620678076e-05,
1307
+ "loss": 1.9701,
1308
+ "step": 20600
1309
+ },
1310
+ {
1311
+ "epoch": 1.95,
1312
+ "learning_rate": 3.7901980530379325e-05,
1313
+ "loss": 2.015,
1314
+ "step": 20700
1315
+ },
1316
+ {
1317
+ "epoch": 1.96,
1318
+ "learning_rate": 3.778449144008056e-05,
1319
+ "loss": 1.9754,
1320
+ "step": 20800
1321
+ },
1322
+ {
1323
+ "epoch": 1.97,
1324
+ "learning_rate": 3.7667002349781804e-05,
1325
+ "loss": 1.9644,
1326
+ "step": 20900
1327
+ },
1328
+ {
1329
+ "epoch": 1.98,
1330
+ "learning_rate": 3.754951325948305e-05,
1331
+ "loss": 1.9766,
1332
+ "step": 21000
1333
+ },
1334
+ {
1335
+ "epoch": 1.99,
1336
+ "learning_rate": 3.743202416918429e-05,
1337
+ "loss": 1.9929,
1338
+ "step": 21100
1339
+ },
1340
+ {
1341
+ "epoch": 2.0,
1342
+ "learning_rate": 3.7314535078885526e-05,
1343
+ "loss": 1.937,
1344
+ "step": 21200
1345
+ },
1346
+ {
1347
+ "epoch": 2.01,
1348
+ "learning_rate": 3.7197045988586775e-05,
1349
+ "loss": 1.5585,
1350
+ "step": 21300
1351
+ },
1352
+ {
1353
+ "epoch": 2.02,
1354
+ "learning_rate": 3.707955689828801e-05,
1355
+ "loss": 1.6113,
1356
+ "step": 21400
1357
+ },
1358
+ {
1359
+ "epoch": 2.03,
1360
+ "learning_rate": 3.6962067807989254e-05,
1361
+ "loss": 1.5737,
1362
+ "step": 21500
1363
+ },
1364
+ {
1365
+ "epoch": 2.04,
1366
+ "learning_rate": 3.6844578717690504e-05,
1367
+ "loss": 1.6545,
1368
+ "step": 21600
1369
+ },
1370
+ {
1371
+ "epoch": 2.05,
1372
+ "learning_rate": 3.672708962739174e-05,
1373
+ "loss": 1.6024,
1374
+ "step": 21700
1375
+ },
1376
+ {
1377
+ "epoch": 2.06,
1378
+ "learning_rate": 3.660960053709298e-05,
1379
+ "loss": 1.6431,
1380
+ "step": 21800
1381
+ },
1382
+ {
1383
+ "epoch": 2.07,
1384
+ "learning_rate": 3.6492111446794226e-05,
1385
+ "loss": 1.5974,
1386
+ "step": 21900
1387
+ },
1388
+ {
1389
+ "epoch": 2.08,
1390
+ "learning_rate": 3.637462235649547e-05,
1391
+ "loss": 1.6111,
1392
+ "step": 22000
1393
+ },
1394
+ {
1395
+ "epoch": 2.09,
1396
+ "learning_rate": 3.625713326619671e-05,
1397
+ "loss": 1.6215,
1398
+ "step": 22100
1399
+ },
1400
+ {
1401
+ "epoch": 2.1,
1402
+ "learning_rate": 3.6139644175897954e-05,
1403
+ "loss": 1.5874,
1404
+ "step": 22200
1405
+ },
1406
+ {
1407
+ "epoch": 2.11,
1408
+ "learning_rate": 3.602215508559919e-05,
1409
+ "loss": 1.5834,
1410
+ "step": 22300
1411
+ },
1412
+ {
1413
+ "epoch": 2.11,
1414
+ "learning_rate": 3.590466599530044e-05,
1415
+ "loss": 1.6405,
1416
+ "step": 22400
1417
+ },
1418
+ {
1419
+ "epoch": 2.12,
1420
+ "learning_rate": 3.5787176905001676e-05,
1421
+ "loss": 1.5939,
1422
+ "step": 22500
1423
+ },
1424
+ {
1425
+ "epoch": 2.13,
1426
+ "learning_rate": 3.566968781470292e-05,
1427
+ "loss": 1.6135,
1428
+ "step": 22600
1429
+ },
1430
+ {
1431
+ "epoch": 2.14,
1432
+ "learning_rate": 3.555219872440416e-05,
1433
+ "loss": 1.6091,
1434
+ "step": 22700
1435
+ },
1436
+ {
1437
+ "epoch": 2.15,
1438
+ "learning_rate": 3.5434709634105405e-05,
1439
+ "loss": 1.6102,
1440
+ "step": 22800
1441
+ },
1442
+ {
1443
+ "epoch": 2.16,
1444
+ "learning_rate": 3.531722054380665e-05,
1445
+ "loss": 1.6148,
1446
+ "step": 22900
1447
+ },
1448
+ {
1449
+ "epoch": 2.17,
1450
+ "learning_rate": 3.519973145350789e-05,
1451
+ "loss": 1.6351,
1452
+ "step": 23000
1453
+ },
1454
+ {
1455
+ "epoch": 2.18,
1456
+ "learning_rate": 3.508224236320913e-05,
1457
+ "loss": 1.5941,
1458
+ "step": 23100
1459
+ },
1460
+ {
1461
+ "epoch": 2.19,
1462
+ "learning_rate": 3.496475327291037e-05,
1463
+ "loss": 1.6088,
1464
+ "step": 23200
1465
+ },
1466
+ {
1467
+ "epoch": 2.2,
1468
+ "learning_rate": 3.484726418261162e-05,
1469
+ "loss": 1.6255,
1470
+ "step": 23300
1471
+ },
1472
+ {
1473
+ "epoch": 2.21,
1474
+ "learning_rate": 3.4729775092312855e-05,
1475
+ "loss": 1.671,
1476
+ "step": 23400
1477
+ },
1478
+ {
1479
+ "epoch": 2.22,
1480
+ "learning_rate": 3.46122860020141e-05,
1481
+ "loss": 1.6096,
1482
+ "step": 23500
1483
+ },
1484
+ {
1485
+ "epoch": 2.23,
1486
+ "learning_rate": 3.449479691171534e-05,
1487
+ "loss": 1.6399,
1488
+ "step": 23600
1489
+ },
1490
+ {
1491
+ "epoch": 2.24,
1492
+ "learning_rate": 3.437730782141658e-05,
1493
+ "loss": 1.5641,
1494
+ "step": 23700
1495
+ },
1496
+ {
1497
+ "epoch": 2.25,
1498
+ "learning_rate": 3.425981873111782e-05,
1499
+ "loss": 1.6423,
1500
+ "step": 23800
1501
+ },
1502
+ {
1503
+ "epoch": 2.26,
1504
+ "learning_rate": 3.414232964081907e-05,
1505
+ "loss": 1.5897,
1506
+ "step": 23900
1507
+ },
1508
+ {
1509
+ "epoch": 2.27,
1510
+ "learning_rate": 3.4024840550520305e-05,
1511
+ "loss": 1.6378,
1512
+ "step": 24000
1513
+ },
1514
+ {
1515
+ "epoch": 2.28,
1516
+ "learning_rate": 3.390735146022155e-05,
1517
+ "loss": 1.6512,
1518
+ "step": 24100
1519
+ },
1520
+ {
1521
+ "epoch": 2.28,
1522
+ "learning_rate": 3.37898623699228e-05,
1523
+ "loss": 1.5813,
1524
+ "step": 24200
1525
+ },
1526
+ {
1527
+ "epoch": 2.29,
1528
+ "learning_rate": 3.3672373279624034e-05,
1529
+ "loss": 1.6065,
1530
+ "step": 24300
1531
+ },
1532
+ {
1533
+ "epoch": 2.3,
1534
+ "learning_rate": 3.3554884189325276e-05,
1535
+ "loss": 1.6269,
1536
+ "step": 24400
1537
+ },
1538
+ {
1539
+ "epoch": 2.31,
1540
+ "learning_rate": 3.343739509902652e-05,
1541
+ "loss": 1.6715,
1542
+ "step": 24500
1543
+ },
1544
+ {
1545
+ "epoch": 2.32,
1546
+ "learning_rate": 3.331990600872776e-05,
1547
+ "loss": 1.645,
1548
+ "step": 24600
1549
+ },
1550
+ {
1551
+ "epoch": 2.33,
1552
+ "learning_rate": 3.3202416918429e-05,
1553
+ "loss": 1.592,
1554
+ "step": 24700
1555
+ },
1556
+ {
1557
+ "epoch": 2.34,
1558
+ "learning_rate": 3.308492782813025e-05,
1559
+ "loss": 1.6165,
1560
+ "step": 24800
1561
+ },
1562
+ {
1563
+ "epoch": 2.35,
1564
+ "learning_rate": 3.2967438737831484e-05,
1565
+ "loss": 1.6097,
1566
+ "step": 24900
1567
+ },
1568
+ {
1569
+ "epoch": 2.36,
1570
+ "learning_rate": 3.284994964753273e-05,
1571
+ "loss": 1.6433,
1572
+ "step": 25000
1573
+ },
1574
+ {
1575
+ "epoch": 2.36,
1576
+ "eval_bleu1": 27.6475,
1577
+ "eval_bleu2": 15.6571,
1578
+ "eval_bleu3": 8.8372,
1579
+ "eval_bleu4": 4.8672,
1580
+ "eval_gen_len": 43.9953,
1581
+ "eval_loss": 2.7900760173797607,
1582
+ "eval_rouge1": 33.6354,
1583
+ "eval_rouge2": 11.5761,
1584
+ "eval_rougeL": 22.6878,
1585
+ "eval_runtime": 88.3325,
1586
+ "eval_samples_per_second": 4.857,
1587
+ "eval_steps_per_second": 0.045,
1588
+ "step": 25000
1589
+ },
1590
+ {
1591
+ "epoch": 2.37,
1592
+ "learning_rate": 3.273246055723397e-05,
1593
+ "loss": 1.616,
1594
+ "step": 25100
1595
+ },
1596
+ {
1597
+ "epoch": 2.38,
1598
+ "learning_rate": 3.261497146693521e-05,
1599
+ "loss": 1.5917,
1600
+ "step": 25200
1601
+ },
1602
+ {
1603
+ "epoch": 2.39,
1604
+ "learning_rate": 3.2497482376636455e-05,
1605
+ "loss": 1.5815,
1606
+ "step": 25300
1607
+ },
1608
+ {
1609
+ "epoch": 2.4,
1610
+ "learning_rate": 3.23799932863377e-05,
1611
+ "loss": 1.5837,
1612
+ "step": 25400
1613
+ },
1614
+ {
1615
+ "epoch": 2.41,
1616
+ "learning_rate": 3.226250419603894e-05,
1617
+ "loss": 1.6637,
1618
+ "step": 25500
1619
+ },
1620
+ {
1621
+ "epoch": 2.42,
1622
+ "learning_rate": 3.2145015105740184e-05,
1623
+ "loss": 1.6237,
1624
+ "step": 25600
1625
+ },
1626
+ {
1627
+ "epoch": 2.43,
1628
+ "learning_rate": 3.2027526015441427e-05,
1629
+ "loss": 1.629,
1630
+ "step": 25700
1631
+ },
1632
+ {
1633
+ "epoch": 2.44,
1634
+ "learning_rate": 3.191003692514266e-05,
1635
+ "loss": 1.6329,
1636
+ "step": 25800
1637
+ },
1638
+ {
1639
+ "epoch": 2.45,
1640
+ "learning_rate": 3.179254783484391e-05,
1641
+ "loss": 1.6019,
1642
+ "step": 25900
1643
+ },
1644
+ {
1645
+ "epoch": 2.45,
1646
+ "learning_rate": 3.167505874454515e-05,
1647
+ "loss": 1.6123,
1648
+ "step": 26000
1649
+ },
1650
+ {
1651
+ "epoch": 2.46,
1652
+ "learning_rate": 3.155756965424639e-05,
1653
+ "loss": 1.7134,
1654
+ "step": 26100
1655
+ },
1656
+ {
1657
+ "epoch": 2.47,
1658
+ "learning_rate": 3.1440080563947634e-05,
1659
+ "loss": 1.6092,
1660
+ "step": 26200
1661
+ },
1662
+ {
1663
+ "epoch": 2.48,
1664
+ "learning_rate": 3.132259147364888e-05,
1665
+ "loss": 1.6641,
1666
+ "step": 26300
1667
+ },
1668
+ {
1669
+ "epoch": 2.49,
1670
+ "learning_rate": 3.120510238335011e-05,
1671
+ "loss": 1.6283,
1672
+ "step": 26400
1673
+ },
1674
+ {
1675
+ "epoch": 2.5,
1676
+ "learning_rate": 3.108761329305136e-05,
1677
+ "loss": 1.5878,
1678
+ "step": 26500
1679
+ },
1680
+ {
1681
+ "epoch": 2.51,
1682
+ "learning_rate": 3.09701242027526e-05,
1683
+ "loss": 1.602,
1684
+ "step": 26600
1685
+ },
1686
+ {
1687
+ "epoch": 2.52,
1688
+ "learning_rate": 3.085263511245384e-05,
1689
+ "loss": 1.6025,
1690
+ "step": 26700
1691
+ },
1692
+ {
1693
+ "epoch": 2.53,
1694
+ "learning_rate": 3.0735146022155084e-05,
1695
+ "loss": 1.6628,
1696
+ "step": 26800
1697
+ },
1698
+ {
1699
+ "epoch": 2.54,
1700
+ "learning_rate": 3.061765693185633e-05,
1701
+ "loss": 1.6723,
1702
+ "step": 26900
1703
+ },
1704
+ {
1705
+ "epoch": 2.55,
1706
+ "learning_rate": 3.0500167841557573e-05,
1707
+ "loss": 1.6593,
1708
+ "step": 27000
1709
+ },
1710
+ {
1711
+ "epoch": 2.56,
1712
+ "learning_rate": 3.038267875125881e-05,
1713
+ "loss": 1.6143,
1714
+ "step": 27100
1715
+ },
1716
+ {
1717
+ "epoch": 2.57,
1718
+ "learning_rate": 3.0265189660960056e-05,
1719
+ "loss": 1.6176,
1720
+ "step": 27200
1721
+ },
1722
+ {
1723
+ "epoch": 2.58,
1724
+ "learning_rate": 3.0147700570661295e-05,
1725
+ "loss": 1.6353,
1726
+ "step": 27300
1727
+ },
1728
+ {
1729
+ "epoch": 2.59,
1730
+ "learning_rate": 3.0030211480362538e-05,
1731
+ "loss": 1.6185,
1732
+ "step": 27400
1733
+ },
1734
+ {
1735
+ "epoch": 2.6,
1736
+ "learning_rate": 2.9912722390063777e-05,
1737
+ "loss": 1.6467,
1738
+ "step": 27500
1739
+ },
1740
+ {
1741
+ "epoch": 2.61,
1742
+ "learning_rate": 2.9795233299765024e-05,
1743
+ "loss": 1.6083,
1744
+ "step": 27600
1745
+ },
1746
+ {
1747
+ "epoch": 2.62,
1748
+ "learning_rate": 2.967774420946626e-05,
1749
+ "loss": 1.701,
1750
+ "step": 27700
1751
+ },
1752
+ {
1753
+ "epoch": 2.62,
1754
+ "learning_rate": 2.9560255119167506e-05,
1755
+ "loss": 1.6665,
1756
+ "step": 27800
1757
+ },
1758
+ {
1759
+ "epoch": 2.63,
1760
+ "learning_rate": 2.9442766028868745e-05,
1761
+ "loss": 1.5699,
1762
+ "step": 27900
1763
+ },
1764
+ {
1765
+ "epoch": 2.64,
1766
+ "learning_rate": 2.9325276938569988e-05,
1767
+ "loss": 1.7012,
1768
+ "step": 28000
1769
+ },
1770
+ {
1771
+ "epoch": 2.65,
1772
+ "learning_rate": 2.9207787848271228e-05,
1773
+ "loss": 1.6608,
1774
+ "step": 28100
1775
+ },
1776
+ {
1777
+ "epoch": 2.66,
1778
+ "learning_rate": 2.9090298757972474e-05,
1779
+ "loss": 1.6536,
1780
+ "step": 28200
1781
+ },
1782
+ {
1783
+ "epoch": 2.67,
1784
+ "learning_rate": 2.8972809667673717e-05,
1785
+ "loss": 1.6572,
1786
+ "step": 28300
1787
+ },
1788
+ {
1789
+ "epoch": 2.68,
1790
+ "learning_rate": 2.8855320577374956e-05,
1791
+ "loss": 1.6291,
1792
+ "step": 28400
1793
+ },
1794
+ {
1795
+ "epoch": 2.69,
1796
+ "learning_rate": 2.8737831487076202e-05,
1797
+ "loss": 1.6501,
1798
+ "step": 28500
1799
+ },
1800
+ {
1801
+ "epoch": 2.7,
1802
+ "learning_rate": 2.8620342396777442e-05,
1803
+ "loss": 1.6456,
1804
+ "step": 28600
1805
+ },
1806
+ {
1807
+ "epoch": 2.71,
1808
+ "learning_rate": 2.8502853306478685e-05,
1809
+ "loss": 1.6701,
1810
+ "step": 28700
1811
+ },
1812
+ {
1813
+ "epoch": 2.72,
1814
+ "learning_rate": 2.8385364216179924e-05,
1815
+ "loss": 1.6873,
1816
+ "step": 28800
1817
+ },
1818
+ {
1819
+ "epoch": 2.73,
1820
+ "learning_rate": 2.826787512588117e-05,
1821
+ "loss": 1.5811,
1822
+ "step": 28900
1823
+ },
1824
+ {
1825
+ "epoch": 2.74,
1826
+ "learning_rate": 2.8150386035582406e-05,
1827
+ "loss": 1.6128,
1828
+ "step": 29000
1829
+ },
1830
+ {
1831
+ "epoch": 2.75,
1832
+ "learning_rate": 2.8032896945283653e-05,
1833
+ "loss": 1.653,
1834
+ "step": 29100
1835
+ },
1836
+ {
1837
+ "epoch": 2.76,
1838
+ "learning_rate": 2.7915407854984896e-05,
1839
+ "loss": 1.6867,
1840
+ "step": 29200
1841
+ },
1842
+ {
1843
+ "epoch": 2.77,
1844
+ "learning_rate": 2.7797918764686135e-05,
1845
+ "loss": 1.6462,
1846
+ "step": 29300
1847
+ },
1848
+ {
1849
+ "epoch": 2.78,
1850
+ "learning_rate": 2.7680429674387378e-05,
1851
+ "loss": 1.5985,
1852
+ "step": 29400
1853
+ },
1854
+ {
1855
+ "epoch": 2.79,
1856
+ "learning_rate": 2.756294058408862e-05,
1857
+ "loss": 1.6274,
1858
+ "step": 29500
1859
+ },
1860
+ {
1861
+ "epoch": 2.79,
1862
+ "learning_rate": 2.744545149378986e-05,
1863
+ "loss": 1.6512,
1864
+ "step": 29600
1865
+ },
1866
+ {
1867
+ "epoch": 2.8,
1868
+ "learning_rate": 2.7327962403491103e-05,
1869
+ "loss": 1.6,
1870
+ "step": 29700
1871
+ },
1872
+ {
1873
+ "epoch": 2.81,
1874
+ "learning_rate": 2.7210473313192346e-05,
1875
+ "loss": 1.7005,
1876
+ "step": 29800
1877
+ },
1878
+ {
1879
+ "epoch": 2.82,
1880
+ "learning_rate": 2.709298422289359e-05,
1881
+ "loss": 1.6796,
1882
+ "step": 29900
1883
+ },
1884
+ {
1885
+ "epoch": 2.83,
1886
+ "learning_rate": 2.6975495132594828e-05,
1887
+ "loss": 1.6204,
1888
+ "step": 30000
1889
+ },
1890
+ {
1891
+ "epoch": 2.83,
1892
+ "eval_bleu1": 29.1014,
1893
+ "eval_bleu2": 16.6689,
1894
+ "eval_bleu3": 9.3661,
1895
+ "eval_bleu4": 5.1916,
1896
+ "eval_gen_len": 48.8811,
1897
+ "eval_loss": 2.7724153995513916,
1898
+ "eval_rouge1": 34.9611,
1899
+ "eval_rouge2": 12.1606,
1900
+ "eval_rougeL": 23.0246,
1901
+ "eval_runtime": 137.6409,
1902
+ "eval_samples_per_second": 3.117,
1903
+ "eval_steps_per_second": 0.029,
1904
+ "step": 30000
1905
+ },
1906
+ {
1907
+ "epoch": 2.84,
1908
+ "learning_rate": 2.685800604229607e-05,
1909
+ "loss": 1.6336,
1910
+ "step": 30100
1911
+ },
1912
+ {
1913
+ "epoch": 2.85,
1914
+ "learning_rate": 2.6740516951997317e-05,
1915
+ "loss": 1.6349,
1916
+ "step": 30200
1917
+ },
1918
+ {
1919
+ "epoch": 2.86,
1920
+ "learning_rate": 2.6623027861698557e-05,
1921
+ "loss": 1.5879,
1922
+ "step": 30300
1923
+ },
1924
+ {
1925
+ "epoch": 2.87,
1926
+ "learning_rate": 2.65055387713998e-05,
1927
+ "loss": 1.6111,
1928
+ "step": 30400
1929
+ },
1930
+ {
1931
+ "epoch": 2.88,
1932
+ "learning_rate": 2.6388049681101042e-05,
1933
+ "loss": 1.6168,
1934
+ "step": 30500
1935
+ },
1936
+ {
1937
+ "epoch": 2.89,
1938
+ "learning_rate": 2.6270560590802282e-05,
1939
+ "loss": 1.6425,
1940
+ "step": 30600
1941
+ },
1942
+ {
1943
+ "epoch": 2.9,
1944
+ "learning_rate": 2.6153071500503525e-05,
1945
+ "loss": 1.6359,
1946
+ "step": 30700
1947
+ },
1948
+ {
1949
+ "epoch": 2.91,
1950
+ "learning_rate": 2.6035582410204767e-05,
1951
+ "loss": 1.6269,
1952
+ "step": 30800
1953
+ },
1954
+ {
1955
+ "epoch": 2.92,
1956
+ "learning_rate": 2.5918093319906007e-05,
1957
+ "loss": 1.6668,
1958
+ "step": 30900
1959
+ },
1960
+ {
1961
+ "epoch": 2.93,
1962
+ "learning_rate": 2.580060422960725e-05,
1963
+ "loss": 1.6364,
1964
+ "step": 31000
1965
+ },
1966
+ {
1967
+ "epoch": 2.94,
1968
+ "learning_rate": 2.5683115139308493e-05,
1969
+ "loss": 1.5957,
1970
+ "step": 31100
1971
+ },
1972
+ {
1973
+ "epoch": 2.95,
1974
+ "learning_rate": 2.5565626049009732e-05,
1975
+ "loss": 1.5837,
1976
+ "step": 31200
1977
+ },
1978
+ {
1979
+ "epoch": 2.96,
1980
+ "learning_rate": 2.5448136958710975e-05,
1981
+ "loss": 1.6382,
1982
+ "step": 31300
1983
+ },
1984
+ {
1985
+ "epoch": 2.96,
1986
+ "learning_rate": 2.5330647868412218e-05,
1987
+ "loss": 1.6077,
1988
+ "step": 31400
1989
+ },
1990
+ {
1991
+ "epoch": 2.97,
1992
+ "learning_rate": 2.521315877811346e-05,
1993
+ "loss": 1.6355,
1994
+ "step": 31500
1995
+ },
1996
+ {
1997
+ "epoch": 2.98,
1998
+ "learning_rate": 2.5095669687814703e-05,
1999
+ "loss": 1.6633,
2000
+ "step": 31600
2001
+ },
2002
+ {
2003
+ "epoch": 2.99,
2004
+ "learning_rate": 2.4978180597515946e-05,
2005
+ "loss": 1.5906,
2006
+ "step": 31700
2007
+ },
2008
+ {
2009
+ "epoch": 3.0,
2010
+ "learning_rate": 2.486069150721719e-05,
2011
+ "loss": 1.5347,
2012
+ "step": 31800
2013
+ },
2014
+ {
2015
+ "epoch": 3.01,
2016
+ "learning_rate": 2.474320241691843e-05,
2017
+ "loss": 1.2498,
2018
+ "step": 31900
2019
+ },
2020
+ {
2021
+ "epoch": 3.02,
2022
+ "learning_rate": 2.462571332661967e-05,
2023
+ "loss": 1.3091,
2024
+ "step": 32000
2025
+ },
2026
+ {
2027
+ "epoch": 3.03,
2028
+ "learning_rate": 2.4508224236320914e-05,
2029
+ "loss": 1.3252,
2030
+ "step": 32100
2031
+ },
2032
+ {
2033
+ "epoch": 3.04,
2034
+ "learning_rate": 2.4390735146022154e-05,
2035
+ "loss": 1.2999,
2036
+ "step": 32200
2037
+ },
2038
+ {
2039
+ "epoch": 3.05,
2040
+ "learning_rate": 2.4273246055723396e-05,
2041
+ "loss": 1.288,
2042
+ "step": 32300
2043
+ },
2044
+ {
2045
+ "epoch": 3.06,
2046
+ "learning_rate": 2.415575696542464e-05,
2047
+ "loss": 1.3072,
2048
+ "step": 32400
2049
+ },
2050
+ {
2051
+ "epoch": 3.07,
2052
+ "learning_rate": 2.403826787512588e-05,
2053
+ "loss": 1.2572,
2054
+ "step": 32500
2055
+ },
2056
+ {
2057
+ "epoch": 3.08,
2058
+ "learning_rate": 2.392077878482712e-05,
2059
+ "loss": 1.3139,
2060
+ "step": 32600
2061
+ },
2062
+ {
2063
+ "epoch": 3.09,
2064
+ "learning_rate": 2.3803289694528364e-05,
2065
+ "loss": 1.344,
2066
+ "step": 32700
2067
+ },
2068
+ {
2069
+ "epoch": 3.1,
2070
+ "learning_rate": 2.3685800604229604e-05,
2071
+ "loss": 1.3242,
2072
+ "step": 32800
2073
+ },
2074
+ {
2075
+ "epoch": 3.11,
2076
+ "learning_rate": 2.356831151393085e-05,
2077
+ "loss": 1.2924,
2078
+ "step": 32900
2079
+ },
2080
+ {
2081
+ "epoch": 3.12,
2082
+ "learning_rate": 2.3450822423632093e-05,
2083
+ "loss": 1.2989,
2084
+ "step": 33000
2085
+ },
2086
+ {
2087
+ "epoch": 3.12,
2088
+ "learning_rate": 2.3333333333333332e-05,
2089
+ "loss": 1.2983,
2090
+ "step": 33100
2091
+ },
2092
+ {
2093
+ "epoch": 3.13,
2094
+ "learning_rate": 2.3215844243034575e-05,
2095
+ "loss": 1.2902,
2096
+ "step": 33200
2097
+ },
2098
+ {
2099
+ "epoch": 3.14,
2100
+ "learning_rate": 2.3098355152735818e-05,
2101
+ "loss": 1.2479,
2102
+ "step": 33300
2103
+ },
2104
+ {
2105
+ "epoch": 3.15,
2106
+ "learning_rate": 2.298086606243706e-05,
2107
+ "loss": 1.2555,
2108
+ "step": 33400
2109
+ },
2110
+ {
2111
+ "epoch": 3.16,
2112
+ "learning_rate": 2.28633769721383e-05,
2113
+ "loss": 1.2613,
2114
+ "step": 33500
2115
+ },
2116
+ {
2117
+ "epoch": 3.17,
2118
+ "learning_rate": 2.2745887881839543e-05,
2119
+ "loss": 1.2782,
2120
+ "step": 33600
2121
+ },
2122
+ {
2123
+ "epoch": 3.18,
2124
+ "learning_rate": 2.2628398791540786e-05,
2125
+ "loss": 1.2686,
2126
+ "step": 33700
2127
+ },
2128
+ {
2129
+ "epoch": 3.19,
2130
+ "learning_rate": 2.2510909701242026e-05,
2131
+ "loss": 1.3088,
2132
+ "step": 33800
2133
+ },
2134
+ {
2135
+ "epoch": 3.2,
2136
+ "learning_rate": 2.239342061094327e-05,
2137
+ "loss": 1.3223,
2138
+ "step": 33900
2139
+ },
2140
+ {
2141
+ "epoch": 3.21,
2142
+ "learning_rate": 2.227593152064451e-05,
2143
+ "loss": 1.3449,
2144
+ "step": 34000
2145
+ },
2146
+ {
2147
+ "epoch": 3.22,
2148
+ "learning_rate": 2.215844243034575e-05,
2149
+ "loss": 1.3136,
2150
+ "step": 34100
2151
+ },
2152
+ {
2153
+ "epoch": 3.23,
2154
+ "learning_rate": 2.2040953340046994e-05,
2155
+ "loss": 1.2793,
2156
+ "step": 34200
2157
+ },
2158
+ {
2159
+ "epoch": 3.24,
2160
+ "learning_rate": 2.192346424974824e-05,
2161
+ "loss": 1.3015,
2162
+ "step": 34300
2163
+ },
2164
+ {
2165
+ "epoch": 3.25,
2166
+ "learning_rate": 2.180597515944948e-05,
2167
+ "loss": 1.2883,
2168
+ "step": 34400
2169
+ },
2170
+ {
2171
+ "epoch": 3.26,
2172
+ "learning_rate": 2.1688486069150722e-05,
2173
+ "loss": 1.3133,
2174
+ "step": 34500
2175
+ },
2176
+ {
2177
+ "epoch": 3.27,
2178
+ "learning_rate": 2.1570996978851965e-05,
2179
+ "loss": 1.3283,
2180
+ "step": 34600
2181
+ },
2182
+ {
2183
+ "epoch": 3.28,
2184
+ "learning_rate": 2.1453507888553204e-05,
2185
+ "loss": 1.267,
2186
+ "step": 34700
2187
+ },
2188
+ {
2189
+ "epoch": 3.29,
2190
+ "learning_rate": 2.1336018798254447e-05,
2191
+ "loss": 1.2919,
2192
+ "step": 34800
2193
+ },
2194
+ {
2195
+ "epoch": 3.29,
2196
+ "learning_rate": 2.121852970795569e-05,
2197
+ "loss": 1.3182,
2198
+ "step": 34900
2199
+ },
2200
+ {
2201
+ "epoch": 3.3,
2202
+ "learning_rate": 2.1101040617656933e-05,
2203
+ "loss": 1.2955,
2204
+ "step": 35000
2205
+ },
2206
+ {
2207
+ "epoch": 3.3,
2208
+ "eval_bleu1": 29.9701,
2209
+ "eval_bleu2": 17.3963,
2210
+ "eval_bleu3": 10.2978,
2211
+ "eval_bleu4": 5.9339,
2212
+ "eval_gen_len": 49.5921,
2213
+ "eval_loss": 2.8970282077789307,
2214
+ "eval_rouge1": 35.896,
2215
+ "eval_rouge2": 12.7037,
2216
+ "eval_rougeL": 23.3781,
2217
+ "eval_runtime": 133.8073,
2218
+ "eval_samples_per_second": 3.206,
2219
+ "eval_steps_per_second": 0.03,
2220
+ "step": 35000
2221
+ },
2222
+ {
2223
+ "epoch": 3.31,
2224
+ "learning_rate": 2.0983551527358172e-05,
2225
+ "loss": 1.3059,
2226
+ "step": 35100
2227
+ },
2228
+ {
2229
+ "epoch": 3.32,
2230
+ "learning_rate": 2.0866062437059415e-05,
2231
+ "loss": 1.2751,
2232
+ "step": 35200
2233
+ },
2234
+ {
2235
+ "epoch": 3.33,
2236
+ "learning_rate": 2.0748573346760658e-05,
2237
+ "loss": 1.3223,
2238
+ "step": 35300
2239
+ },
2240
+ {
2241
+ "epoch": 3.34,
2242
+ "learning_rate": 2.0631084256461897e-05,
2243
+ "loss": 1.2946,
2244
+ "step": 35400
2245
+ },
2246
+ {
2247
+ "epoch": 3.35,
2248
+ "learning_rate": 2.051359516616314e-05,
2249
+ "loss": 1.3418,
2250
+ "step": 35500
2251
+ },
2252
+ {
2253
+ "epoch": 3.36,
2254
+ "learning_rate": 2.0396106075864386e-05,
2255
+ "loss": 1.3068,
2256
+ "step": 35600
2257
+ },
2258
+ {
2259
+ "epoch": 3.37,
2260
+ "learning_rate": 2.0278616985565626e-05,
2261
+ "loss": 1.3108,
2262
+ "step": 35700
2263
+ },
2264
+ {
2265
+ "epoch": 3.38,
2266
+ "learning_rate": 2.016112789526687e-05,
2267
+ "loss": 1.3569,
2268
+ "step": 35800
2269
+ },
2270
+ {
2271
+ "epoch": 3.39,
2272
+ "learning_rate": 2.004363880496811e-05,
2273
+ "loss": 1.338,
2274
+ "step": 35900
2275
+ },
2276
+ {
2277
+ "epoch": 3.4,
2278
+ "learning_rate": 1.992614971466935e-05,
2279
+ "loss": 1.2918,
2280
+ "step": 36000
2281
+ },
2282
+ {
2283
+ "epoch": 3.41,
2284
+ "learning_rate": 1.9808660624370594e-05,
2285
+ "loss": 1.3495,
2286
+ "step": 36100
2287
+ },
2288
+ {
2289
+ "epoch": 3.42,
2290
+ "learning_rate": 1.9691171534071837e-05,
2291
+ "loss": 1.2948,
2292
+ "step": 36200
2293
+ },
2294
+ {
2295
+ "epoch": 3.43,
2296
+ "learning_rate": 1.9573682443773076e-05,
2297
+ "loss": 1.3311,
2298
+ "step": 36300
2299
+ },
2300
+ {
2301
+ "epoch": 3.44,
2302
+ "learning_rate": 1.945619335347432e-05,
2303
+ "loss": 1.3683,
2304
+ "step": 36400
2305
+ },
2306
+ {
2307
+ "epoch": 3.45,
2308
+ "learning_rate": 1.9338704263175562e-05,
2309
+ "loss": 1.2935,
2310
+ "step": 36500
2311
+ },
2312
+ {
2313
+ "epoch": 3.46,
2314
+ "learning_rate": 1.9221215172876805e-05,
2315
+ "loss": 1.3249,
2316
+ "step": 36600
2317
+ },
2318
+ {
2319
+ "epoch": 3.46,
2320
+ "learning_rate": 1.9103726082578044e-05,
2321
+ "loss": 1.3301,
2322
+ "step": 36700
2323
+ },
2324
+ {
2325
+ "epoch": 3.47,
2326
+ "learning_rate": 1.8986236992279287e-05,
2327
+ "loss": 1.2982,
2328
+ "step": 36800
2329
+ },
2330
+ {
2331
+ "epoch": 3.48,
2332
+ "learning_rate": 1.886874790198053e-05,
2333
+ "loss": 1.3244,
2334
+ "step": 36900
2335
+ },
2336
+ {
2337
+ "epoch": 3.49,
2338
+ "learning_rate": 1.8751258811681773e-05,
2339
+ "loss": 1.286,
2340
+ "step": 37000
2341
+ },
2342
+ {
2343
+ "epoch": 3.5,
2344
+ "learning_rate": 1.8633769721383016e-05,
2345
+ "loss": 1.343,
2346
+ "step": 37100
2347
+ },
2348
+ {
2349
+ "epoch": 3.51,
2350
+ "learning_rate": 1.851628063108426e-05,
2351
+ "loss": 1.3298,
2352
+ "step": 37200
2353
+ },
2354
+ {
2355
+ "epoch": 3.52,
2356
+ "learning_rate": 1.8398791540785498e-05,
2357
+ "loss": 1.3453,
2358
+ "step": 37300
2359
+ },
2360
+ {
2361
+ "epoch": 3.53,
2362
+ "learning_rate": 1.828130245048674e-05,
2363
+ "loss": 1.344,
2364
+ "step": 37400
2365
+ },
2366
+ {
2367
+ "epoch": 3.54,
2368
+ "learning_rate": 1.8163813360187984e-05,
2369
+ "loss": 1.3198,
2370
+ "step": 37500
2371
+ },
2372
+ {
2373
+ "epoch": 3.55,
2374
+ "learning_rate": 1.8046324269889223e-05,
2375
+ "loss": 1.3162,
2376
+ "step": 37600
2377
+ },
2378
+ {
2379
+ "epoch": 3.56,
2380
+ "learning_rate": 1.7928835179590466e-05,
2381
+ "loss": 1.3382,
2382
+ "step": 37700
2383
+ },
2384
+ {
2385
+ "epoch": 3.57,
2386
+ "learning_rate": 1.781134608929171e-05,
2387
+ "loss": 1.3602,
2388
+ "step": 37800
2389
+ },
2390
+ {
2391
+ "epoch": 3.58,
2392
+ "learning_rate": 1.7693856998992948e-05,
2393
+ "loss": 1.2834,
2394
+ "step": 37900
2395
+ },
2396
+ {
2397
+ "epoch": 3.59,
2398
+ "learning_rate": 1.757636790869419e-05,
2399
+ "loss": 1.3467,
2400
+ "step": 38000
2401
+ },
2402
+ {
2403
+ "epoch": 3.6,
2404
+ "learning_rate": 1.7458878818395434e-05,
2405
+ "loss": 1.3505,
2406
+ "step": 38100
2407
+ },
2408
+ {
2409
+ "epoch": 3.61,
2410
+ "learning_rate": 1.7341389728096677e-05,
2411
+ "loss": 1.3005,
2412
+ "step": 38200
2413
+ },
2414
+ {
2415
+ "epoch": 3.62,
2416
+ "learning_rate": 1.722390063779792e-05,
2417
+ "loss": 1.3155,
2418
+ "step": 38300
2419
+ },
2420
+ {
2421
+ "epoch": 3.63,
2422
+ "learning_rate": 1.7106411547499162e-05,
2423
+ "loss": 1.3164,
2424
+ "step": 38400
2425
+ },
2426
+ {
2427
+ "epoch": 3.63,
2428
+ "learning_rate": 1.6988922457200405e-05,
2429
+ "loss": 1.2915,
2430
+ "step": 38500
2431
+ },
2432
+ {
2433
+ "epoch": 3.64,
2434
+ "learning_rate": 1.6871433366901645e-05,
2435
+ "loss": 1.3501,
2436
+ "step": 38600
2437
+ },
2438
+ {
2439
+ "epoch": 3.65,
2440
+ "learning_rate": 1.6753944276602887e-05,
2441
+ "loss": 1.345,
2442
+ "step": 38700
2443
+ },
2444
+ {
2445
+ "epoch": 3.66,
2446
+ "learning_rate": 1.663645518630413e-05,
2447
+ "loss": 1.3215,
2448
+ "step": 38800
2449
+ },
2450
+ {
2451
+ "epoch": 3.67,
2452
+ "learning_rate": 1.651896609600537e-05,
2453
+ "loss": 1.3088,
2454
+ "step": 38900
2455
+ },
2456
+ {
2457
+ "epoch": 3.68,
2458
+ "learning_rate": 1.6401477005706613e-05,
2459
+ "loss": 1.266,
2460
+ "step": 39000
2461
+ },
2462
+ {
2463
+ "epoch": 3.69,
2464
+ "learning_rate": 1.6283987915407855e-05,
2465
+ "loss": 1.3522,
2466
+ "step": 39100
2467
+ },
2468
+ {
2469
+ "epoch": 3.7,
2470
+ "learning_rate": 1.6166498825109095e-05,
2471
+ "loss": 1.298,
2472
+ "step": 39200
2473
+ },
2474
+ {
2475
+ "epoch": 3.71,
2476
+ "learning_rate": 1.6049009734810338e-05,
2477
+ "loss": 1.255,
2478
+ "step": 39300
2479
+ },
2480
+ {
2481
+ "epoch": 3.72,
2482
+ "learning_rate": 1.593152064451158e-05,
2483
+ "loss": 1.2987,
2484
+ "step": 39400
2485
+ },
2486
+ {
2487
+ "epoch": 3.73,
2488
+ "learning_rate": 1.581403155421282e-05,
2489
+ "loss": 1.3309,
2490
+ "step": 39500
2491
+ },
2492
+ {
2493
+ "epoch": 3.74,
2494
+ "learning_rate": 1.5696542463914063e-05,
2495
+ "loss": 1.2839,
2496
+ "step": 39600
2497
+ },
2498
+ {
2499
+ "epoch": 3.75,
2500
+ "learning_rate": 1.557905337361531e-05,
2501
+ "loss": 1.3109,
2502
+ "step": 39700
2503
+ },
2504
+ {
2505
+ "epoch": 3.76,
2506
+ "learning_rate": 1.546156428331655e-05,
2507
+ "loss": 1.3073,
2508
+ "step": 39800
2509
+ },
2510
+ {
2511
+ "epoch": 3.77,
2512
+ "learning_rate": 1.534407519301779e-05,
2513
+ "loss": 1.339,
2514
+ "step": 39900
2515
+ },
2516
+ {
2517
+ "epoch": 3.78,
2518
+ "learning_rate": 1.5226586102719034e-05,
2519
+ "loss": 1.3501,
2520
+ "step": 40000
2521
+ },
2522
+ {
2523
+ "epoch": 3.78,
2524
+ "eval_bleu1": 29.483,
2525
+ "eval_bleu2": 16.7795,
2526
+ "eval_bleu3": 9.4124,
2527
+ "eval_bleu4": 5.2042,
2528
+ "eval_gen_len": 48.5897,
2529
+ "eval_loss": 2.8853633403778076,
2530
+ "eval_rouge1": 35.2981,
2531
+ "eval_rouge2": 12.1133,
2532
+ "eval_rougeL": 23.1845,
2533
+ "eval_runtime": 112.9023,
2534
+ "eval_samples_per_second": 3.8,
2535
+ "eval_steps_per_second": 0.035,
2536
+ "step": 40000
2537
+ },
2538
+ {
2539
+ "epoch": 3.79,
2540
+ "learning_rate": 1.5109097012420275e-05,
2541
+ "loss": 1.2452,
2542
+ "step": 40100
2543
+ },
2544
+ {
2545
+ "epoch": 3.8,
2546
+ "learning_rate": 1.4991607922121517e-05,
2547
+ "loss": 1.3348,
2548
+ "step": 40200
2549
+ },
2550
+ {
2551
+ "epoch": 3.8,
2552
+ "learning_rate": 1.487411883182276e-05,
2553
+ "loss": 1.3156,
2554
+ "step": 40300
2555
+ },
2556
+ {
2557
+ "epoch": 3.81,
2558
+ "learning_rate": 1.4756629741524e-05,
2559
+ "loss": 1.3597,
2560
+ "step": 40400
2561
+ },
2562
+ {
2563
+ "epoch": 3.82,
2564
+ "learning_rate": 1.4639140651225243e-05,
2565
+ "loss": 1.362,
2566
+ "step": 40500
2567
+ },
2568
+ {
2569
+ "epoch": 3.83,
2570
+ "learning_rate": 1.4521651560926484e-05,
2571
+ "loss": 1.2591,
2572
+ "step": 40600
2573
+ },
2574
+ {
2575
+ "epoch": 3.84,
2576
+ "learning_rate": 1.4404162470627726e-05,
2577
+ "loss": 1.3309,
2578
+ "step": 40700
2579
+ },
2580
+ {
2581
+ "epoch": 3.85,
2582
+ "learning_rate": 1.4286673380328968e-05,
2583
+ "loss": 1.3623,
2584
+ "step": 40800
2585
+ },
2586
+ {
2587
+ "epoch": 3.86,
2588
+ "learning_rate": 1.416918429003021e-05,
2589
+ "loss": 1.3032,
2590
+ "step": 40900
2591
+ },
2592
+ {
2593
+ "epoch": 3.87,
2594
+ "learning_rate": 1.4051695199731454e-05,
2595
+ "loss": 1.3075,
2596
+ "step": 41000
2597
+ },
2598
+ {
2599
+ "epoch": 3.88,
2600
+ "learning_rate": 1.3934206109432695e-05,
2601
+ "loss": 1.3549,
2602
+ "step": 41100
2603
+ },
2604
+ {
2605
+ "epoch": 3.89,
2606
+ "learning_rate": 1.3816717019133936e-05,
2607
+ "loss": 1.3257,
2608
+ "step": 41200
2609
+ },
2610
+ {
2611
+ "epoch": 3.9,
2612
+ "learning_rate": 1.369922792883518e-05,
2613
+ "loss": 1.3262,
2614
+ "step": 41300
2615
+ },
2616
+ {
2617
+ "epoch": 3.91,
2618
+ "learning_rate": 1.3581738838536422e-05,
2619
+ "loss": 1.3143,
2620
+ "step": 41400
2621
+ },
2622
+ {
2623
+ "epoch": 3.92,
2624
+ "learning_rate": 1.3464249748237663e-05,
2625
+ "loss": 1.341,
2626
+ "step": 41500
2627
+ },
2628
+ {
2629
+ "epoch": 3.93,
2630
+ "learning_rate": 1.3346760657938906e-05,
2631
+ "loss": 1.3282,
2632
+ "step": 41600
2633
+ },
2634
+ {
2635
+ "epoch": 3.94,
2636
+ "learning_rate": 1.3229271567640147e-05,
2637
+ "loss": 1.3146,
2638
+ "step": 41700
2639
+ },
2640
+ {
2641
+ "epoch": 3.95,
2642
+ "learning_rate": 1.3111782477341388e-05,
2643
+ "loss": 1.3032,
2644
+ "step": 41800
2645
+ },
2646
+ {
2647
+ "epoch": 3.96,
2648
+ "learning_rate": 1.2994293387042631e-05,
2649
+ "loss": 1.3441,
2650
+ "step": 41900
2651
+ },
2652
+ {
2653
+ "epoch": 3.97,
2654
+ "learning_rate": 1.2876804296743874e-05,
2655
+ "loss": 1.3165,
2656
+ "step": 42000
2657
+ },
2658
+ {
2659
+ "epoch": 3.97,
2660
+ "learning_rate": 1.2759315206445117e-05,
2661
+ "loss": 1.3309,
2662
+ "step": 42100
2663
+ },
2664
+ {
2665
+ "epoch": 3.98,
2666
+ "learning_rate": 1.2641826116146358e-05,
2667
+ "loss": 1.3448,
2668
+ "step": 42200
2669
+ },
2670
+ {
2671
+ "epoch": 3.99,
2672
+ "learning_rate": 1.25243370258476e-05,
2673
+ "loss": 1.3181,
2674
+ "step": 42300
2675
+ },
2676
+ {
2677
+ "epoch": 4.0,
2678
+ "learning_rate": 1.2406847935548842e-05,
2679
+ "loss": 1.2496,
2680
+ "step": 42400
2681
+ },
2682
+ {
2683
+ "epoch": 4.01,
2684
+ "learning_rate": 1.2289358845250083e-05,
2685
+ "loss": 1.0748,
2686
+ "step": 42500
2687
+ },
2688
+ {
2689
+ "epoch": 4.02,
2690
+ "learning_rate": 1.2171869754951324e-05,
2691
+ "loss": 1.0485,
2692
+ "step": 42600
2693
+ },
2694
+ {
2695
+ "epoch": 4.03,
2696
+ "learning_rate": 1.2054380664652569e-05,
2697
+ "loss": 1.0712,
2698
+ "step": 42700
2699
+ },
2700
+ {
2701
+ "epoch": 4.04,
2702
+ "learning_rate": 1.193689157435381e-05,
2703
+ "loss": 1.0456,
2704
+ "step": 42800
2705
+ },
2706
+ {
2707
+ "epoch": 4.05,
2708
+ "learning_rate": 1.1819402484055053e-05,
2709
+ "loss": 1.0662,
2710
+ "step": 42900
2711
+ },
2712
+ {
2713
+ "epoch": 4.06,
2714
+ "learning_rate": 1.1701913393756294e-05,
2715
+ "loss": 1.0658,
2716
+ "step": 43000
2717
+ },
2718
+ {
2719
+ "epoch": 4.07,
2720
+ "learning_rate": 1.1584424303457535e-05,
2721
+ "loss": 1.0439,
2722
+ "step": 43100
2723
+ },
2724
+ {
2725
+ "epoch": 4.08,
2726
+ "learning_rate": 1.1466935213158778e-05,
2727
+ "loss": 1.0558,
2728
+ "step": 43200
2729
+ },
2730
+ {
2731
+ "epoch": 4.09,
2732
+ "learning_rate": 1.134944612286002e-05,
2733
+ "loss": 1.0442,
2734
+ "step": 43300
2735
+ },
2736
+ {
2737
+ "epoch": 4.1,
2738
+ "learning_rate": 1.1231957032561262e-05,
2739
+ "loss": 1.1046,
2740
+ "step": 43400
2741
+ },
2742
+ {
2743
+ "epoch": 4.11,
2744
+ "learning_rate": 1.1114467942262505e-05,
2745
+ "loss": 1.0772,
2746
+ "step": 43500
2747
+ },
2748
+ {
2749
+ "epoch": 4.12,
2750
+ "learning_rate": 1.0996978851963746e-05,
2751
+ "loss": 1.0464,
2752
+ "step": 43600
2753
+ },
2754
+ {
2755
+ "epoch": 4.13,
2756
+ "learning_rate": 1.0879489761664989e-05,
2757
+ "loss": 1.049,
2758
+ "step": 43700
2759
+ },
2760
+ {
2761
+ "epoch": 4.14,
2762
+ "learning_rate": 1.076200067136623e-05,
2763
+ "loss": 1.0382,
2764
+ "step": 43800
2765
+ },
2766
+ {
2767
+ "epoch": 4.14,
2768
+ "learning_rate": 1.0644511581067471e-05,
2769
+ "loss": 1.0537,
2770
+ "step": 43900
2771
+ },
2772
+ {
2773
+ "epoch": 4.15,
2774
+ "learning_rate": 1.0527022490768716e-05,
2775
+ "loss": 1.0735,
2776
+ "step": 44000
2777
+ },
2778
+ {
2779
+ "epoch": 4.16,
2780
+ "learning_rate": 1.0409533400469957e-05,
2781
+ "loss": 1.0729,
2782
+ "step": 44100
2783
+ },
2784
+ {
2785
+ "epoch": 4.17,
2786
+ "learning_rate": 1.0292044310171198e-05,
2787
+ "loss": 1.0703,
2788
+ "step": 44200
2789
+ },
2790
+ {
2791
+ "epoch": 4.18,
2792
+ "learning_rate": 1.017455521987244e-05,
2793
+ "loss": 1.0869,
2794
+ "step": 44300
2795
+ },
2796
+ {
2797
+ "epoch": 4.19,
2798
+ "learning_rate": 1.0057066129573682e-05,
2799
+ "loss": 1.0986,
2800
+ "step": 44400
2801
+ },
2802
+ {
2803
+ "epoch": 4.2,
2804
+ "learning_rate": 9.939577039274923e-06,
2805
+ "loss": 1.1168,
2806
+ "step": 44500
2807
+ },
2808
+ {
2809
+ "epoch": 4.21,
2810
+ "learning_rate": 9.822087948976166e-06,
2811
+ "loss": 1.0136,
2812
+ "step": 44600
2813
+ },
2814
+ {
2815
+ "epoch": 4.22,
2816
+ "learning_rate": 9.704598858677409e-06,
2817
+ "loss": 1.0433,
2818
+ "step": 44700
2819
+ },
2820
+ {
2821
+ "epoch": 4.23,
2822
+ "learning_rate": 9.587109768378652e-06,
2823
+ "loss": 1.0302,
2824
+ "step": 44800
2825
+ },
2826
+ {
2827
+ "epoch": 4.24,
2828
+ "learning_rate": 9.469620678079893e-06,
2829
+ "loss": 1.0834,
2830
+ "step": 44900
2831
+ },
2832
+ {
2833
+ "epoch": 4.25,
2834
+ "learning_rate": 9.352131587781134e-06,
2835
+ "loss": 1.0865,
2836
+ "step": 45000
2837
+ },
2838
+ {
2839
+ "epoch": 4.25,
2840
+ "eval_bleu1": 29.9364,
2841
+ "eval_bleu2": 17.2064,
2842
+ "eval_bleu3": 10.0427,
2843
+ "eval_bleu4": 5.62,
2844
+ "eval_gen_len": 48.31,
2845
+ "eval_loss": 2.9911699295043945,
2846
+ "eval_rouge1": 35.581,
2847
+ "eval_rouge2": 12.5145,
2848
+ "eval_rougeL": 23.2262,
2849
+ "eval_runtime": 132.7683,
2850
+ "eval_samples_per_second": 3.231,
2851
+ "eval_steps_per_second": 0.03,
2852
+ "step": 45000
2853
+ },
2854
+ {
2855
+ "epoch": 4.26,
2856
+ "learning_rate": 9.234642497482377e-06,
2857
+ "loss": 1.0786,
2858
+ "step": 45100
2859
+ },
2860
+ {
2861
+ "epoch": 4.27,
2862
+ "learning_rate": 9.117153407183618e-06,
2863
+ "loss": 1.0742,
2864
+ "step": 45200
2865
+ },
2866
+ {
2867
+ "epoch": 4.28,
2868
+ "learning_rate": 8.999664316884859e-06,
2869
+ "loss": 1.0674,
2870
+ "step": 45300
2871
+ },
2872
+ {
2873
+ "epoch": 4.29,
2874
+ "learning_rate": 8.882175226586104e-06,
2875
+ "loss": 1.0379,
2876
+ "step": 45400
2877
+ },
2878
+ {
2879
+ "epoch": 4.3,
2880
+ "learning_rate": 8.764686136287345e-06,
2881
+ "loss": 1.0856,
2882
+ "step": 45500
2883
+ },
2884
+ {
2885
+ "epoch": 4.31,
2886
+ "learning_rate": 8.647197045988588e-06,
2887
+ "loss": 1.0653,
2888
+ "step": 45600
2889
+ },
2890
+ {
2891
+ "epoch": 4.31,
2892
+ "learning_rate": 8.529707955689829e-06,
2893
+ "loss": 1.0765,
2894
+ "step": 45700
2895
+ },
2896
+ {
2897
+ "epoch": 4.32,
2898
+ "learning_rate": 8.41221886539107e-06,
2899
+ "loss": 1.0777,
2900
+ "step": 45800
2901
+ },
2902
+ {
2903
+ "epoch": 4.33,
2904
+ "learning_rate": 8.294729775092313e-06,
2905
+ "loss": 1.0399,
2906
+ "step": 45900
2907
+ },
2908
+ {
2909
+ "epoch": 4.34,
2910
+ "learning_rate": 8.177240684793554e-06,
2911
+ "loss": 1.0091,
2912
+ "step": 46000
2913
+ },
2914
+ {
2915
+ "epoch": 4.35,
2916
+ "learning_rate": 8.059751594494797e-06,
2917
+ "loss": 1.0784,
2918
+ "step": 46100
2919
+ },
2920
+ {
2921
+ "epoch": 4.36,
2922
+ "learning_rate": 7.94226250419604e-06,
2923
+ "loss": 1.0458,
2924
+ "step": 46200
2925
+ },
2926
+ {
2927
+ "epoch": 4.37,
2928
+ "learning_rate": 7.82477341389728e-06,
2929
+ "loss": 1.0492,
2930
+ "step": 46300
2931
+ },
2932
+ {
2933
+ "epoch": 4.38,
2934
+ "learning_rate": 7.707284323598524e-06,
2935
+ "loss": 1.1092,
2936
+ "step": 46400
2937
+ },
2938
+ {
2939
+ "epoch": 4.39,
2940
+ "learning_rate": 7.589795233299765e-06,
2941
+ "loss": 1.0529,
2942
+ "step": 46500
2943
+ },
2944
+ {
2945
+ "epoch": 4.4,
2946
+ "learning_rate": 7.472306143001007e-06,
2947
+ "loss": 1.0548,
2948
+ "step": 46600
2949
+ },
2950
+ {
2951
+ "epoch": 4.41,
2952
+ "learning_rate": 7.3548170527022495e-06,
2953
+ "loss": 1.036,
2954
+ "step": 46700
2955
+ },
2956
+ {
2957
+ "epoch": 4.42,
2958
+ "learning_rate": 7.2373279624034915e-06,
2959
+ "loss": 1.0559,
2960
+ "step": 46800
2961
+ },
2962
+ {
2963
+ "epoch": 4.43,
2964
+ "learning_rate": 7.1198388721047335e-06,
2965
+ "loss": 1.0745,
2966
+ "step": 46900
2967
+ },
2968
+ {
2969
+ "epoch": 4.44,
2970
+ "learning_rate": 7.0023497818059755e-06,
2971
+ "loss": 1.0568,
2972
+ "step": 47000
2973
+ },
2974
+ {
2975
+ "epoch": 4.45,
2976
+ "learning_rate": 6.884860691507217e-06,
2977
+ "loss": 1.092,
2978
+ "step": 47100
2979
+ },
2980
+ {
2981
+ "epoch": 4.46,
2982
+ "learning_rate": 6.7673716012084595e-06,
2983
+ "loss": 1.0348,
2984
+ "step": 47200
2985
+ },
2986
+ {
2987
+ "epoch": 4.47,
2988
+ "learning_rate": 6.6498825109097015e-06,
2989
+ "loss": 1.0851,
2990
+ "step": 47300
2991
+ },
2992
+ {
2993
+ "epoch": 4.48,
2994
+ "learning_rate": 6.5323934206109435e-06,
2995
+ "loss": 1.0879,
2996
+ "step": 47400
2997
+ },
2998
+ {
2999
+ "epoch": 4.48,
3000
+ "learning_rate": 6.414904330312185e-06,
3001
+ "loss": 1.0162,
3002
+ "step": 47500
3003
+ },
3004
+ {
3005
+ "epoch": 4.49,
3006
+ "learning_rate": 6.2974152400134274e-06,
3007
+ "loss": 1.0917,
3008
+ "step": 47600
3009
+ },
3010
+ {
3011
+ "epoch": 4.5,
3012
+ "learning_rate": 6.1799261497146694e-06,
3013
+ "loss": 1.0483,
3014
+ "step": 47700
3015
+ },
3016
+ {
3017
+ "epoch": 4.51,
3018
+ "learning_rate": 6.0624370594159114e-06,
3019
+ "loss": 1.0914,
3020
+ "step": 47800
3021
+ },
3022
+ {
3023
+ "epoch": 4.52,
3024
+ "learning_rate": 5.944947969117153e-06,
3025
+ "loss": 1.0819,
3026
+ "step": 47900
3027
+ },
3028
+ {
3029
+ "epoch": 4.53,
3030
+ "learning_rate": 5.827458878818395e-06,
3031
+ "loss": 1.0699,
3032
+ "step": 48000
3033
+ },
3034
+ {
3035
+ "epoch": 4.54,
3036
+ "learning_rate": 5.709969788519637e-06,
3037
+ "loss": 1.0507,
3038
+ "step": 48100
3039
+ },
3040
+ {
3041
+ "epoch": 4.55,
3042
+ "learning_rate": 5.592480698220879e-06,
3043
+ "loss": 1.1011,
3044
+ "step": 48200
3045
+ },
3046
+ {
3047
+ "epoch": 4.56,
3048
+ "learning_rate": 5.474991607922121e-06,
3049
+ "loss": 1.0611,
3050
+ "step": 48300
3051
+ },
3052
+ {
3053
+ "epoch": 4.57,
3054
+ "learning_rate": 5.357502517623363e-06,
3055
+ "loss": 1.0708,
3056
+ "step": 48400
3057
+ },
3058
+ {
3059
+ "epoch": 4.58,
3060
+ "learning_rate": 5.240013427324605e-06,
3061
+ "loss": 1.0579,
3062
+ "step": 48500
3063
+ },
3064
+ {
3065
+ "epoch": 4.59,
3066
+ "learning_rate": 5.122524337025847e-06,
3067
+ "loss": 1.0582,
3068
+ "step": 48600
3069
+ },
3070
+ {
3071
+ "epoch": 4.6,
3072
+ "learning_rate": 5.005035246727089e-06,
3073
+ "loss": 1.0748,
3074
+ "step": 48700
3075
+ },
3076
+ {
3077
+ "epoch": 4.61,
3078
+ "learning_rate": 4.887546156428331e-06,
3079
+ "loss": 1.0763,
3080
+ "step": 48800
3081
+ },
3082
+ {
3083
+ "epoch": 4.62,
3084
+ "learning_rate": 4.770057066129574e-06,
3085
+ "loss": 1.085,
3086
+ "step": 48900
3087
+ },
3088
+ {
3089
+ "epoch": 4.63,
3090
+ "learning_rate": 4.652567975830815e-06,
3091
+ "loss": 1.059,
3092
+ "step": 49000
3093
+ },
3094
+ {
3095
+ "epoch": 4.64,
3096
+ "learning_rate": 4.535078885532057e-06,
3097
+ "loss": 1.119,
3098
+ "step": 49100
3099
+ },
3100
+ {
3101
+ "epoch": 4.65,
3102
+ "learning_rate": 4.417589795233299e-06,
3103
+ "loss": 1.0784,
3104
+ "step": 49200
3105
+ },
3106
+ {
3107
+ "epoch": 4.65,
3108
+ "learning_rate": 4.300100704934542e-06,
3109
+ "loss": 1.0702,
3110
+ "step": 49300
3111
+ },
3112
+ {
3113
+ "epoch": 4.66,
3114
+ "learning_rate": 4.182611614635783e-06,
3115
+ "loss": 1.0722,
3116
+ "step": 49400
3117
+ },
3118
+ {
3119
+ "epoch": 4.67,
3120
+ "learning_rate": 4.065122524337025e-06,
3121
+ "loss": 1.0578,
3122
+ "step": 49500
3123
+ },
3124
+ {
3125
+ "epoch": 4.68,
3126
+ "learning_rate": 3.947633434038268e-06,
3127
+ "loss": 1.0838,
3128
+ "step": 49600
3129
+ },
3130
+ {
3131
+ "epoch": 4.69,
3132
+ "learning_rate": 3.83014434373951e-06,
3133
+ "loss": 1.0704,
3134
+ "step": 49700
3135
+ },
3136
+ {
3137
+ "epoch": 4.7,
3138
+ "learning_rate": 3.7126552534407517e-06,
3139
+ "loss": 1.0318,
3140
+ "step": 49800
3141
+ },
3142
+ {
3143
+ "epoch": 4.71,
3144
+ "learning_rate": 3.595166163141994e-06,
3145
+ "loss": 1.0619,
3146
+ "step": 49900
3147
+ },
3148
+ {
3149
+ "epoch": 4.72,
3150
+ "learning_rate": 3.477677072843236e-06,
3151
+ "loss": 1.052,
3152
+ "step": 50000
3153
+ },
3154
+ {
3155
+ "epoch": 4.72,
3156
+ "eval_bleu1": 29.793,
3157
+ "eval_bleu2": 16.882,
3158
+ "eval_bleu3": 9.6468,
3159
+ "eval_bleu4": 5.3654,
3160
+ "eval_gen_len": 50.6014,
3161
+ "eval_loss": 2.989078998565674,
3162
+ "eval_rouge1": 35.4597,
3163
+ "eval_rouge2": 12.0824,
3164
+ "eval_rougeL": 23.0161,
3165
+ "eval_runtime": 113.3288,
3166
+ "eval_samples_per_second": 3.785,
3167
+ "eval_steps_per_second": 0.035,
3168
+ "step": 50000
3169
+ },
3170
+ {
3171
+ "epoch": 4.73,
3172
+ "learning_rate": 3.3601879825444777e-06,
3173
+ "loss": 1.0508,
3174
+ "step": 50100
3175
+ },
3176
+ {
3177
+ "epoch": 4.74,
3178
+ "learning_rate": 3.24269889224572e-06,
3179
+ "loss": 1.052,
3180
+ "step": 50200
3181
+ },
3182
+ {
3183
+ "epoch": 4.75,
3184
+ "learning_rate": 3.1252098019469617e-06,
3185
+ "loss": 1.0357,
3186
+ "step": 50300
3187
+ },
3188
+ {
3189
+ "epoch": 4.76,
3190
+ "learning_rate": 3.007720711648204e-06,
3191
+ "loss": 1.0977,
3192
+ "step": 50400
3193
+ },
3194
+ {
3195
+ "epoch": 4.77,
3196
+ "learning_rate": 2.890231621349446e-06,
3197
+ "loss": 1.0845,
3198
+ "step": 50500
3199
+ },
3200
+ {
3201
+ "epoch": 4.78,
3202
+ "learning_rate": 2.772742531050688e-06,
3203
+ "loss": 1.0882,
3204
+ "step": 50600
3205
+ },
3206
+ {
3207
+ "epoch": 4.79,
3208
+ "learning_rate": 2.65525344075193e-06,
3209
+ "loss": 1.0633,
3210
+ "step": 50700
3211
+ },
3212
+ {
3213
+ "epoch": 4.8,
3214
+ "learning_rate": 2.537764350453172e-06,
3215
+ "loss": 1.0854,
3216
+ "step": 50800
3217
+ },
3218
+ {
3219
+ "epoch": 4.81,
3220
+ "learning_rate": 2.420275260154414e-06,
3221
+ "loss": 1.0948,
3222
+ "step": 50900
3223
+ },
3224
+ {
3225
+ "epoch": 4.81,
3226
+ "learning_rate": 2.3027861698556565e-06,
3227
+ "loss": 1.0605,
3228
+ "step": 51000
3229
+ },
3230
+ {
3231
+ "epoch": 4.82,
3232
+ "learning_rate": 2.185297079556898e-06,
3233
+ "loss": 1.0726,
3234
+ "step": 51100
3235
+ },
3236
+ {
3237
+ "epoch": 4.83,
3238
+ "learning_rate": 2.0678079892581405e-06,
3239
+ "loss": 1.0495,
3240
+ "step": 51200
3241
+ },
3242
+ {
3243
+ "epoch": 4.84,
3244
+ "learning_rate": 1.950318898959382e-06,
3245
+ "loss": 1.0376,
3246
+ "step": 51300
3247
+ },
3248
+ {
3249
+ "epoch": 4.85,
3250
+ "learning_rate": 1.8328298086606243e-06,
3251
+ "loss": 1.0678,
3252
+ "step": 51400
3253
+ },
3254
+ {
3255
+ "epoch": 4.86,
3256
+ "learning_rate": 1.7153407183618663e-06,
3257
+ "loss": 1.0557,
3258
+ "step": 51500
3259
+ },
3260
+ {
3261
+ "epoch": 4.87,
3262
+ "learning_rate": 1.5978516280631082e-06,
3263
+ "loss": 1.0985,
3264
+ "step": 51600
3265
+ },
3266
+ {
3267
+ "epoch": 4.88,
3268
+ "learning_rate": 1.4803625377643502e-06,
3269
+ "loss": 1.0851,
3270
+ "step": 51700
3271
+ },
3272
+ {
3273
+ "epoch": 4.89,
3274
+ "learning_rate": 1.3628734474655924e-06,
3275
+ "loss": 1.0664,
3276
+ "step": 51800
3277
+ },
3278
+ {
3279
+ "epoch": 4.9,
3280
+ "learning_rate": 1.2453843571668344e-06,
3281
+ "loss": 1.0403,
3282
+ "step": 51900
3283
+ },
3284
+ {
3285
+ "epoch": 4.91,
3286
+ "learning_rate": 1.1278952668680764e-06,
3287
+ "loss": 1.0449,
3288
+ "step": 52000
3289
+ },
3290
+ {
3291
+ "epoch": 4.92,
3292
+ "learning_rate": 1.0104061765693184e-06,
3293
+ "loss": 1.0334,
3294
+ "step": 52100
3295
+ },
3296
+ {
3297
+ "epoch": 4.93,
3298
+ "learning_rate": 8.929170862705605e-07,
3299
+ "loss": 1.0725,
3300
+ "step": 52200
3301
+ },
3302
+ {
3303
+ "epoch": 4.94,
3304
+ "learning_rate": 7.754279959718026e-07,
3305
+ "loss": 1.0981,
3306
+ "step": 52300
3307
+ },
3308
+ {
3309
+ "epoch": 4.95,
3310
+ "learning_rate": 6.579389056730447e-07,
3311
+ "loss": 1.0442,
3312
+ "step": 52400
3313
+ },
3314
+ {
3315
+ "epoch": 4.96,
3316
+ "learning_rate": 5.404498153742867e-07,
3317
+ "loss": 1.0475,
3318
+ "step": 52500
3319
+ },
3320
+ {
3321
+ "epoch": 4.97,
3322
+ "learning_rate": 4.229607250755287e-07,
3323
+ "loss": 1.0857,
3324
+ "step": 52600
3325
+ },
3326
+ {
3327
+ "epoch": 4.98,
3328
+ "learning_rate": 3.054716347767707e-07,
3329
+ "loss": 1.1035,
3330
+ "step": 52700
3331
+ },
3332
+ {
3333
+ "epoch": 4.98,
3334
+ "learning_rate": 1.8798254447801276e-07,
3335
+ "loss": 1.0935,
3336
+ "step": 52800
3337
+ },
3338
+ {
3339
+ "epoch": 4.99,
3340
+ "learning_rate": 7.049345417925478e-08,
3341
+ "loss": 1.0707,
3342
+ "step": 52900
3343
+ },
3344
+ {
3345
+ "epoch": 5.0,
3346
+ "step": 52960,
3347
+ "total_flos": 1.489824167150592e+16,
3348
+ "train_loss": 1.7010083939733822,
3349
+ "train_runtime": 6451.2169,
3350
+ "train_samples_per_second": 32.836,
3351
+ "train_steps_per_second": 8.209
3352
+ }
3353
+ ],
3354
+ "max_steps": 52960,
3355
+ "num_train_epochs": 5,
3356
+ "total_flos": 1.489824167150592e+16,
3357
+ "trial_name": null,
3358
+ "trial_params": null
3359
+ }