nemik commited on
Commit
aef37d4
1 Parent(s): a5364b3

End of training

Browse files
README.md CHANGED
@@ -26,16 +26,16 @@ model-index:
26
  metrics:
27
  - name: Accuracy
28
  type: accuracy
29
- value: 0.9422222222222222
30
  - name: F1
31
  type: f1
32
- value: 0.8488372093023255
33
  - name: Precision
34
  type: precision
35
- value: 0.8548009367681498
36
  - name: Recall
37
  type: recall
38
- value: 0.8429561200923787
39
  ---
40
 
41
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -45,11 +45,11 @@ should probably proofread and complete it, then remove this comment. -->
45
 
46
  This model is a fine-tuned version of [apple/mobilevitv2-1.0-imagenet1k-256](https://huggingface.co/apple/mobilevitv2-1.0-imagenet1k-256) on the webdataset dataset.
47
  It achieves the following results on the evaluation set:
48
- - Loss: 0.1545
49
- - Accuracy: 0.9422
50
- - F1: 0.8488
51
- - Precision: 0.8548
52
- - Recall: 0.8430
53
 
54
  ## Model description
55
 
 
26
  metrics:
27
  - name: Accuracy
28
  type: accuracy
29
+ value: 0.9444444444444444
30
  - name: F1
31
  type: f1
32
+ value: 0.8544819557625145
33
  - name: Precision
34
  type: precision
35
+ value: 0.8615023474178404
36
  - name: Recall
37
  type: recall
38
+ value: 0.8475750577367206
39
  ---
40
 
41
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
45
 
46
  This model is a fine-tuned version of [apple/mobilevitv2-1.0-imagenet1k-256](https://huggingface.co/apple/mobilevitv2-1.0-imagenet1k-256) on the webdataset dataset.
47
  It achieves the following results on the evaluation set:
48
+ - Loss: 0.1539
49
+ - Accuracy: 0.9444
50
+ - F1: 0.8545
51
+ - Precision: 0.8615
52
+ - Recall: 0.8476
53
 
54
  ## Model description
55
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "eval_accuracy": 0.9444444444444444,
4
+ "eval_f1": 0.8544819557625145,
5
+ "eval_loss": 0.15389865636825562,
6
+ "eval_precision": 0.8615023474178404,
7
+ "eval_recall": 0.8475750577367206,
8
+ "eval_runtime": 2.2689,
9
+ "eval_samples_per_second": 99.169,
10
+ "eval_steps_per_second": 12.782,
11
+ "total_flos": 1.77124415883264e+17,
12
+ "train_loss": 0.20865077226482637,
13
+ "train_runtime": 373.9101,
14
+ "train_samples_per_second": 72.21,
15
+ "train_steps_per_second": 4.573
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "eval_accuracy": 0.9444444444444444,
4
+ "eval_f1": 0.8544819557625145,
5
+ "eval_loss": 0.15389865636825562,
6
+ "eval_precision": 0.8615023474178404,
7
+ "eval_recall": 0.8475750577367206,
8
+ "eval_runtime": 2.2689,
9
+ "eval_samples_per_second": 99.169,
10
+ "eval_steps_per_second": 12.782
11
+ }
runs/Oct22_02-50-10_7bf328c0e77a/events.out.tfevents.1729566009.7bf328c0e77a.215.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d4b8c7a57e2e385e416ecaeb7cf42ddab99ec5e21e881a69fe9bb3b12466e47
3
+ size 560
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "total_flos": 1.77124415883264e+17,
4
+ "train_loss": 0.20865077226482637,
5
+ "train_runtime": 373.9101,
6
+ "train_samples_per_second": 72.21,
7
+ "train_steps_per_second": 4.573
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1443 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.15389865636825562,
3
+ "best_model_checkpoint": "mobilevitv2-1.0-imagenet1k-256-finetuned_v2024-10-21-frost/checkpoint-1000",
4
+ "epoch": 30.0,
5
+ "eval_steps": 100,
6
+ "global_step": 1710,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.17543859649122806,
13
+ "grad_norm": 0.3124828040599823,
14
+ "learning_rate": 1.1695906432748537e-05,
15
+ "loss": 0.6955,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.3508771929824561,
20
+ "grad_norm": 0.24917739629745483,
21
+ "learning_rate": 2.3391812865497074e-05,
22
+ "loss": 0.6942,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.5263157894736842,
27
+ "grad_norm": 0.2268371284008026,
28
+ "learning_rate": 3.508771929824561e-05,
29
+ "loss": 0.6939,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.7017543859649122,
34
+ "grad_norm": 0.2435961812734604,
35
+ "learning_rate": 4.678362573099415e-05,
36
+ "loss": 0.6918,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.8771929824561403,
41
+ "grad_norm": 0.24638999998569489,
42
+ "learning_rate": 5.847953216374269e-05,
43
+ "loss": 0.6889,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 1.0526315789473684,
48
+ "grad_norm": 0.2426590472459793,
49
+ "learning_rate": 7.017543859649122e-05,
50
+ "loss": 0.6854,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 1.2280701754385965,
55
+ "grad_norm": 0.26534757018089294,
56
+ "learning_rate": 8.187134502923976e-05,
57
+ "loss": 0.6803,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 1.4035087719298245,
62
+ "grad_norm": 0.2573549449443817,
63
+ "learning_rate": 9.35672514619883e-05,
64
+ "loss": 0.6763,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 1.5789473684210527,
69
+ "grad_norm": 0.2639031410217285,
70
+ "learning_rate": 0.00010526315789473685,
71
+ "loss": 0.6701,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 1.7543859649122808,
76
+ "grad_norm": 0.26114630699157715,
77
+ "learning_rate": 0.00011695906432748539,
78
+ "loss": 0.6635,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 1.7543859649122808,
83
+ "eval_accuracy": 0.7604444444444445,
84
+ "eval_f1": 0.5705179282868525,
85
+ "eval_loss": 0.6512863039970398,
86
+ "eval_precision": 0.43552311435523117,
87
+ "eval_recall": 0.8267898383371824,
88
+ "eval_runtime": 2.9095,
89
+ "eval_samples_per_second": 77.332,
90
+ "eval_steps_per_second": 9.967,
91
+ "step": 100
92
+ },
93
+ {
94
+ "epoch": 1.9298245614035088,
95
+ "grad_norm": 0.3371104896068573,
96
+ "learning_rate": 0.0001286549707602339,
97
+ "loss": 0.6502,
98
+ "step": 110
99
+ },
100
+ {
101
+ "epoch": 2.1052631578947367,
102
+ "grad_norm": 0.31244638562202454,
103
+ "learning_rate": 0.00014035087719298245,
104
+ "loss": 0.6343,
105
+ "step": 120
106
+ },
107
+ {
108
+ "epoch": 2.280701754385965,
109
+ "grad_norm": 0.47065746784210205,
110
+ "learning_rate": 0.00015204678362573098,
111
+ "loss": 0.6161,
112
+ "step": 130
113
+ },
114
+ {
115
+ "epoch": 2.456140350877193,
116
+ "grad_norm": 0.41640815138816833,
117
+ "learning_rate": 0.00016374269005847952,
118
+ "loss": 0.588,
119
+ "step": 140
120
+ },
121
+ {
122
+ "epoch": 2.6315789473684212,
123
+ "grad_norm": 0.34670090675354004,
124
+ "learning_rate": 0.00017543859649122806,
125
+ "loss": 0.5565,
126
+ "step": 150
127
+ },
128
+ {
129
+ "epoch": 2.807017543859649,
130
+ "grad_norm": 0.384328693151474,
131
+ "learning_rate": 0.0001871345029239766,
132
+ "loss": 0.5242,
133
+ "step": 160
134
+ },
135
+ {
136
+ "epoch": 2.982456140350877,
137
+ "grad_norm": 0.4133964478969574,
138
+ "learning_rate": 0.00019883040935672513,
139
+ "loss": 0.5158,
140
+ "step": 170
141
+ },
142
+ {
143
+ "epoch": 3.1578947368421053,
144
+ "grad_norm": 0.4693595767021179,
145
+ "learning_rate": 0.00019883040935672513,
146
+ "loss": 0.4658,
147
+ "step": 180
148
+ },
149
+ {
150
+ "epoch": 3.3333333333333335,
151
+ "grad_norm": 0.41811782121658325,
152
+ "learning_rate": 0.00019753086419753085,
153
+ "loss": 0.4297,
154
+ "step": 190
155
+ },
156
+ {
157
+ "epoch": 3.5087719298245617,
158
+ "grad_norm": 0.8540976643562317,
159
+ "learning_rate": 0.00019623131903833657,
160
+ "loss": 0.4461,
161
+ "step": 200
162
+ },
163
+ {
164
+ "epoch": 3.5087719298245617,
165
+ "eval_accuracy": 0.8768888888888889,
166
+ "eval_f1": 0.729227761485826,
167
+ "eval_loss": 0.3972250819206238,
168
+ "eval_precision": 0.6322033898305085,
169
+ "eval_recall": 0.8614318706697459,
170
+ "eval_runtime": 1.766,
171
+ "eval_samples_per_second": 127.406,
172
+ "eval_steps_per_second": 16.421,
173
+ "step": 200
174
+ },
175
+ {
176
+ "epoch": 3.6842105263157894,
177
+ "grad_norm": 0.8259305357933044,
178
+ "learning_rate": 0.0001949317738791423,
179
+ "loss": 0.3914,
180
+ "step": 210
181
+ },
182
+ {
183
+ "epoch": 3.8596491228070176,
184
+ "grad_norm": 0.8546284437179565,
185
+ "learning_rate": 0.00019363222871994802,
186
+ "loss": 0.384,
187
+ "step": 220
188
+ },
189
+ {
190
+ "epoch": 4.035087719298246,
191
+ "grad_norm": 0.3827027678489685,
192
+ "learning_rate": 0.00019233268356075374,
193
+ "loss": 0.3497,
194
+ "step": 230
195
+ },
196
+ {
197
+ "epoch": 4.2105263157894735,
198
+ "grad_norm": 0.6248043775558472,
199
+ "learning_rate": 0.00019103313840155946,
200
+ "loss": 0.3648,
201
+ "step": 240
202
+ },
203
+ {
204
+ "epoch": 4.385964912280702,
205
+ "grad_norm": 0.5684685111045837,
206
+ "learning_rate": 0.00018973359324236518,
207
+ "loss": 0.3112,
208
+ "step": 250
209
+ },
210
+ {
211
+ "epoch": 4.56140350877193,
212
+ "grad_norm": 0.5080260634422302,
213
+ "learning_rate": 0.0001884340480831709,
214
+ "loss": 0.3059,
215
+ "step": 260
216
+ },
217
+ {
218
+ "epoch": 4.7368421052631575,
219
+ "grad_norm": 0.5282370448112488,
220
+ "learning_rate": 0.0001871345029239766,
221
+ "loss": 0.2922,
222
+ "step": 270
223
+ },
224
+ {
225
+ "epoch": 4.912280701754386,
226
+ "grad_norm": 0.7253307104110718,
227
+ "learning_rate": 0.00018583495776478232,
228
+ "loss": 0.2909,
229
+ "step": 280
230
+ },
231
+ {
232
+ "epoch": 5.087719298245614,
233
+ "grad_norm": 0.7058104276657104,
234
+ "learning_rate": 0.00018453541260558804,
235
+ "loss": 0.2922,
236
+ "step": 290
237
+ },
238
+ {
239
+ "epoch": 5.2631578947368425,
240
+ "grad_norm": 1.1993378400802612,
241
+ "learning_rate": 0.00018323586744639376,
242
+ "loss": 0.2599,
243
+ "step": 300
244
+ },
245
+ {
246
+ "epoch": 5.2631578947368425,
247
+ "eval_accuracy": 0.9226666666666666,
248
+ "eval_f1": 0.804932735426009,
249
+ "eval_loss": 0.2404223531484604,
250
+ "eval_precision": 0.7821350762527233,
251
+ "eval_recall": 0.8290993071593533,
252
+ "eval_runtime": 2.7313,
253
+ "eval_samples_per_second": 82.378,
254
+ "eval_steps_per_second": 10.618,
255
+ "step": 300
256
+ },
257
+ {
258
+ "epoch": 5.43859649122807,
259
+ "grad_norm": 0.8134835362434387,
260
+ "learning_rate": 0.00018193632228719948,
261
+ "loss": 0.2645,
262
+ "step": 310
263
+ },
264
+ {
265
+ "epoch": 5.614035087719298,
266
+ "grad_norm": 0.7742730975151062,
267
+ "learning_rate": 0.0001806367771280052,
268
+ "loss": 0.2345,
269
+ "step": 320
270
+ },
271
+ {
272
+ "epoch": 5.7894736842105265,
273
+ "grad_norm": 0.5191880464553833,
274
+ "learning_rate": 0.00017933723196881092,
275
+ "loss": 0.2504,
276
+ "step": 330
277
+ },
278
+ {
279
+ "epoch": 5.964912280701754,
280
+ "grad_norm": 0.7682189345359802,
281
+ "learning_rate": 0.00017803768680961664,
282
+ "loss": 0.2654,
283
+ "step": 340
284
+ },
285
+ {
286
+ "epoch": 6.140350877192983,
287
+ "grad_norm": 0.7704707384109497,
288
+ "learning_rate": 0.00017673814165042236,
289
+ "loss": 0.2431,
290
+ "step": 350
291
+ },
292
+ {
293
+ "epoch": 6.315789473684211,
294
+ "grad_norm": 0.9333469867706299,
295
+ "learning_rate": 0.00017543859649122806,
296
+ "loss": 0.2382,
297
+ "step": 360
298
+ },
299
+ {
300
+ "epoch": 6.491228070175438,
301
+ "grad_norm": 0.8412513136863708,
302
+ "learning_rate": 0.00017413905133203378,
303
+ "loss": 0.2207,
304
+ "step": 370
305
+ },
306
+ {
307
+ "epoch": 6.666666666666667,
308
+ "grad_norm": 0.7568041086196899,
309
+ "learning_rate": 0.0001728395061728395,
310
+ "loss": 0.2271,
311
+ "step": 380
312
+ },
313
+ {
314
+ "epoch": 6.842105263157895,
315
+ "grad_norm": 0.689445436000824,
316
+ "learning_rate": 0.00017153996101364522,
317
+ "loss": 0.2076,
318
+ "step": 390
319
+ },
320
+ {
321
+ "epoch": 7.017543859649122,
322
+ "grad_norm": 0.7390238046646118,
323
+ "learning_rate": 0.00017024041585445094,
324
+ "loss": 0.2074,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 7.017543859649122,
329
+ "eval_accuracy": 0.9346666666666666,
330
+ "eval_f1": 0.8256227758007118,
331
+ "eval_loss": 0.1941838562488556,
332
+ "eval_precision": 0.848780487804878,
333
+ "eval_recall": 0.8036951501154734,
334
+ "eval_runtime": 1.7733,
335
+ "eval_samples_per_second": 126.88,
336
+ "eval_steps_per_second": 16.353,
337
+ "step": 400
338
+ },
339
+ {
340
+ "epoch": 7.192982456140351,
341
+ "grad_norm": 0.4645775258541107,
342
+ "learning_rate": 0.00016894087069525666,
343
+ "loss": 0.2233,
344
+ "step": 410
345
+ },
346
+ {
347
+ "epoch": 7.368421052631579,
348
+ "grad_norm": 0.6826916337013245,
349
+ "learning_rate": 0.00016764132553606238,
350
+ "loss": 0.1846,
351
+ "step": 420
352
+ },
353
+ {
354
+ "epoch": 7.543859649122807,
355
+ "grad_norm": 0.6299170851707458,
356
+ "learning_rate": 0.0001663417803768681,
357
+ "loss": 0.1807,
358
+ "step": 430
359
+ },
360
+ {
361
+ "epoch": 7.719298245614035,
362
+ "grad_norm": 0.40688008069992065,
363
+ "learning_rate": 0.00016504223521767383,
364
+ "loss": 0.1925,
365
+ "step": 440
366
+ },
367
+ {
368
+ "epoch": 7.894736842105263,
369
+ "grad_norm": 0.8310642242431641,
370
+ "learning_rate": 0.00016374269005847952,
371
+ "loss": 0.1906,
372
+ "step": 450
373
+ },
374
+ {
375
+ "epoch": 8.070175438596491,
376
+ "grad_norm": 0.7561126351356506,
377
+ "learning_rate": 0.00016244314489928524,
378
+ "loss": 0.2537,
379
+ "step": 460
380
+ },
381
+ {
382
+ "epoch": 8.24561403508772,
383
+ "grad_norm": 1.5505608320236206,
384
+ "learning_rate": 0.00016114359974009096,
385
+ "loss": 0.2134,
386
+ "step": 470
387
+ },
388
+ {
389
+ "epoch": 8.421052631578947,
390
+ "grad_norm": 0.5844523310661316,
391
+ "learning_rate": 0.00015984405458089668,
392
+ "loss": 0.1927,
393
+ "step": 480
394
+ },
395
+ {
396
+ "epoch": 8.596491228070175,
397
+ "grad_norm": 0.6846328377723694,
398
+ "learning_rate": 0.0001585445094217024,
399
+ "loss": 0.1843,
400
+ "step": 490
401
+ },
402
+ {
403
+ "epoch": 8.771929824561404,
404
+ "grad_norm": 0.5246126651763916,
405
+ "learning_rate": 0.00015724496426250813,
406
+ "loss": 0.167,
407
+ "step": 500
408
+ },
409
+ {
410
+ "epoch": 8.771929824561404,
411
+ "eval_accuracy": 0.9364444444444444,
412
+ "eval_f1": 0.8354430379746836,
413
+ "eval_loss": 0.17720411717891693,
414
+ "eval_precision": 0.8325688073394495,
415
+ "eval_recall": 0.8383371824480369,
416
+ "eval_runtime": 2.7456,
417
+ "eval_samples_per_second": 81.95,
418
+ "eval_steps_per_second": 10.562,
419
+ "step": 500
420
+ },
421
+ {
422
+ "epoch": 8.947368421052632,
423
+ "grad_norm": 0.9557002782821655,
424
+ "learning_rate": 0.00015594541910331385,
425
+ "loss": 0.1752,
426
+ "step": 510
427
+ },
428
+ {
429
+ "epoch": 9.12280701754386,
430
+ "grad_norm": 1.115300178527832,
431
+ "learning_rate": 0.00015464587394411957,
432
+ "loss": 0.2,
433
+ "step": 520
434
+ },
435
+ {
436
+ "epoch": 9.298245614035087,
437
+ "grad_norm": 0.6540657877922058,
438
+ "learning_rate": 0.00015334632878492526,
439
+ "loss": 0.158,
440
+ "step": 530
441
+ },
442
+ {
443
+ "epoch": 9.473684210526315,
444
+ "grad_norm": 0.8491069078445435,
445
+ "learning_rate": 0.00015204678362573098,
446
+ "loss": 0.1813,
447
+ "step": 540
448
+ },
449
+ {
450
+ "epoch": 9.649122807017545,
451
+ "grad_norm": 1.3543705940246582,
452
+ "learning_rate": 0.0001507472384665367,
453
+ "loss": 0.1951,
454
+ "step": 550
455
+ },
456
+ {
457
+ "epoch": 9.824561403508772,
458
+ "grad_norm": 0.8627998232841492,
459
+ "learning_rate": 0.00014944769330734243,
460
+ "loss": 0.1945,
461
+ "step": 560
462
+ },
463
+ {
464
+ "epoch": 10.0,
465
+ "grad_norm": 1.2822953462600708,
466
+ "learning_rate": 0.00014814814814814815,
467
+ "loss": 0.1591,
468
+ "step": 570
469
+ },
470
+ {
471
+ "epoch": 10.175438596491228,
472
+ "grad_norm": 0.6904670596122742,
473
+ "learning_rate": 0.00014684860298895387,
474
+ "loss": 0.1545,
475
+ "step": 580
476
+ },
477
+ {
478
+ "epoch": 10.350877192982455,
479
+ "grad_norm": 1.3155221939086914,
480
+ "learning_rate": 0.0001455490578297596,
481
+ "loss": 0.1385,
482
+ "step": 590
483
+ },
484
+ {
485
+ "epoch": 10.526315789473685,
486
+ "grad_norm": 0.8683547973632812,
487
+ "learning_rate": 0.0001442495126705653,
488
+ "loss": 0.1661,
489
+ "step": 600
490
+ },
491
+ {
492
+ "epoch": 10.526315789473685,
493
+ "eval_accuracy": 0.9342222222222222,
494
+ "eval_f1": 0.8258823529411765,
495
+ "eval_loss": 0.16532927751541138,
496
+ "eval_precision": 0.841726618705036,
497
+ "eval_recall": 0.8106235565819861,
498
+ "eval_runtime": 1.7784,
499
+ "eval_samples_per_second": 126.515,
500
+ "eval_steps_per_second": 16.306,
501
+ "step": 600
502
+ },
503
+ {
504
+ "epoch": 10.701754385964913,
505
+ "grad_norm": 0.7406933307647705,
506
+ "learning_rate": 0.00014294996751137103,
507
+ "loss": 0.1569,
508
+ "step": 610
509
+ },
510
+ {
511
+ "epoch": 10.87719298245614,
512
+ "grad_norm": 1.5100739002227783,
513
+ "learning_rate": 0.00014165042235217672,
514
+ "loss": 0.1873,
515
+ "step": 620
516
+ },
517
+ {
518
+ "epoch": 11.052631578947368,
519
+ "grad_norm": 0.8658424019813538,
520
+ "learning_rate": 0.00014035087719298245,
521
+ "loss": 0.1771,
522
+ "step": 630
523
+ },
524
+ {
525
+ "epoch": 11.228070175438596,
526
+ "grad_norm": 0.761426568031311,
527
+ "learning_rate": 0.00013905133203378817,
528
+ "loss": 0.1522,
529
+ "step": 640
530
+ },
531
+ {
532
+ "epoch": 11.403508771929825,
533
+ "grad_norm": 0.6994770765304565,
534
+ "learning_rate": 0.0001377517868745939,
535
+ "loss": 0.1462,
536
+ "step": 650
537
+ },
538
+ {
539
+ "epoch": 11.578947368421053,
540
+ "grad_norm": 0.6044259071350098,
541
+ "learning_rate": 0.0001364522417153996,
542
+ "loss": 0.1688,
543
+ "step": 660
544
+ },
545
+ {
546
+ "epoch": 11.75438596491228,
547
+ "grad_norm": 0.6377450227737427,
548
+ "learning_rate": 0.00013515269655620533,
549
+ "loss": 0.1726,
550
+ "step": 670
551
+ },
552
+ {
553
+ "epoch": 11.929824561403509,
554
+ "grad_norm": 0.45792627334594727,
555
+ "learning_rate": 0.00013385315139701105,
556
+ "loss": 0.1578,
557
+ "step": 680
558
+ },
559
+ {
560
+ "epoch": 12.105263157894736,
561
+ "grad_norm": 0.5658883452415466,
562
+ "learning_rate": 0.00013255360623781677,
563
+ "loss": 0.1528,
564
+ "step": 690
565
+ },
566
+ {
567
+ "epoch": 12.280701754385966,
568
+ "grad_norm": 0.568031370639801,
569
+ "learning_rate": 0.0001312540610786225,
570
+ "loss": 0.1603,
571
+ "step": 700
572
+ },
573
+ {
574
+ "epoch": 12.280701754385966,
575
+ "eval_accuracy": 0.9408888888888889,
576
+ "eval_f1": 0.8473019517795637,
577
+ "eval_loss": 0.16492225229740143,
578
+ "eval_precision": 0.8424657534246576,
579
+ "eval_recall": 0.8521939953810623,
580
+ "eval_runtime": 2.4488,
581
+ "eval_samples_per_second": 91.883,
582
+ "eval_steps_per_second": 11.843,
583
+ "step": 700
584
+ },
585
+ {
586
+ "epoch": 12.456140350877194,
587
+ "grad_norm": 0.8529219031333923,
588
+ "learning_rate": 0.0001299545159194282,
589
+ "loss": 0.1438,
590
+ "step": 710
591
+ },
592
+ {
593
+ "epoch": 12.631578947368421,
594
+ "grad_norm": 0.7960824370384216,
595
+ "learning_rate": 0.0001286549707602339,
596
+ "loss": 0.1245,
597
+ "step": 720
598
+ },
599
+ {
600
+ "epoch": 12.807017543859649,
601
+ "grad_norm": 0.8270284533500671,
602
+ "learning_rate": 0.00012748538011695908,
603
+ "loss": 0.1775,
604
+ "step": 730
605
+ },
606
+ {
607
+ "epoch": 12.982456140350877,
608
+ "grad_norm": 0.407463014125824,
609
+ "learning_rate": 0.0001261858349577648,
610
+ "loss": 0.1583,
611
+ "step": 740
612
+ },
613
+ {
614
+ "epoch": 13.157894736842104,
615
+ "grad_norm": 1.2405822277069092,
616
+ "learning_rate": 0.0001248862897985705,
617
+ "loss": 0.1412,
618
+ "step": 750
619
+ },
620
+ {
621
+ "epoch": 13.333333333333334,
622
+ "grad_norm": 0.7762990593910217,
623
+ "learning_rate": 0.00012358674463937622,
624
+ "loss": 0.137,
625
+ "step": 760
626
+ },
627
+ {
628
+ "epoch": 13.508771929824562,
629
+ "grad_norm": 0.7772154808044434,
630
+ "learning_rate": 0.00012228719948018194,
631
+ "loss": 0.1618,
632
+ "step": 770
633
+ },
634
+ {
635
+ "epoch": 13.68421052631579,
636
+ "grad_norm": 0.3346017599105835,
637
+ "learning_rate": 0.00012098765432098766,
638
+ "loss": 0.1276,
639
+ "step": 780
640
+ },
641
+ {
642
+ "epoch": 13.859649122807017,
643
+ "grad_norm": 0.7661828994750977,
644
+ "learning_rate": 0.00011968810916179338,
645
+ "loss": 0.1606,
646
+ "step": 790
647
+ },
648
+ {
649
+ "epoch": 14.035087719298245,
650
+ "grad_norm": 1.2454911470413208,
651
+ "learning_rate": 0.0001183885640025991,
652
+ "loss": 0.1523,
653
+ "step": 800
654
+ },
655
+ {
656
+ "epoch": 14.035087719298245,
657
+ "eval_accuracy": 0.9466666666666667,
658
+ "eval_f1": 0.8591549295774648,
659
+ "eval_loss": 0.15682315826416016,
660
+ "eval_precision": 0.8735083532219571,
661
+ "eval_recall": 0.8452655889145496,
662
+ "eval_runtime": 1.8011,
663
+ "eval_samples_per_second": 124.926,
664
+ "eval_steps_per_second": 16.102,
665
+ "step": 800
666
+ },
667
+ {
668
+ "epoch": 14.210526315789474,
669
+ "grad_norm": 3.0044612884521484,
670
+ "learning_rate": 0.00011708901884340481,
671
+ "loss": 0.1331,
672
+ "step": 810
673
+ },
674
+ {
675
+ "epoch": 14.385964912280702,
676
+ "grad_norm": 0.7117482423782349,
677
+ "learning_rate": 0.00011578947368421053,
678
+ "loss": 0.1619,
679
+ "step": 820
680
+ },
681
+ {
682
+ "epoch": 14.56140350877193,
683
+ "grad_norm": 0.6939218044281006,
684
+ "learning_rate": 0.00011448992852501626,
685
+ "loss": 0.1531,
686
+ "step": 830
687
+ },
688
+ {
689
+ "epoch": 14.736842105263158,
690
+ "grad_norm": 0.5622960329055786,
691
+ "learning_rate": 0.00011319038336582198,
692
+ "loss": 0.131,
693
+ "step": 840
694
+ },
695
+ {
696
+ "epoch": 14.912280701754385,
697
+ "grad_norm": 0.9399430155754089,
698
+ "learning_rate": 0.0001118908382066277,
699
+ "loss": 0.1276,
700
+ "step": 850
701
+ },
702
+ {
703
+ "epoch": 15.087719298245615,
704
+ "grad_norm": 1.6480320692062378,
705
+ "learning_rate": 0.0001105912930474334,
706
+ "loss": 0.1656,
707
+ "step": 860
708
+ },
709
+ {
710
+ "epoch": 15.263157894736842,
711
+ "grad_norm": 0.7238647937774658,
712
+ "learning_rate": 0.00010929174788823913,
713
+ "loss": 0.1261,
714
+ "step": 870
715
+ },
716
+ {
717
+ "epoch": 15.43859649122807,
718
+ "grad_norm": 1.0423846244812012,
719
+ "learning_rate": 0.00010799220272904485,
720
+ "loss": 0.1328,
721
+ "step": 880
722
+ },
723
+ {
724
+ "epoch": 15.614035087719298,
725
+ "grad_norm": 1.1374431848526,
726
+ "learning_rate": 0.00010669265756985057,
727
+ "loss": 0.1427,
728
+ "step": 890
729
+ },
730
+ {
731
+ "epoch": 15.789473684210526,
732
+ "grad_norm": 0.7375030517578125,
733
+ "learning_rate": 0.00010539311241065628,
734
+ "loss": 0.1506,
735
+ "step": 900
736
+ },
737
+ {
738
+ "epoch": 15.789473684210526,
739
+ "eval_accuracy": 0.9431111111111111,
740
+ "eval_f1": 0.8494117647058823,
741
+ "eval_loss": 0.15481138229370117,
742
+ "eval_precision": 0.8657074340527577,
743
+ "eval_recall": 0.8337182448036952,
744
+ "eval_runtime": 1.8243,
745
+ "eval_samples_per_second": 123.334,
746
+ "eval_steps_per_second": 15.896,
747
+ "step": 900
748
+ },
749
+ {
750
+ "epoch": 15.964912280701755,
751
+ "grad_norm": 0.7035567164421082,
752
+ "learning_rate": 0.000104093567251462,
753
+ "loss": 0.1324,
754
+ "step": 910
755
+ },
756
+ {
757
+ "epoch": 16.140350877192983,
758
+ "grad_norm": 0.6969211101531982,
759
+ "learning_rate": 0.00010279402209226772,
760
+ "loss": 0.1257,
761
+ "step": 920
762
+ },
763
+ {
764
+ "epoch": 16.31578947368421,
765
+ "grad_norm": 0.3633826673030853,
766
+ "learning_rate": 0.00010149447693307344,
767
+ "loss": 0.1306,
768
+ "step": 930
769
+ },
770
+ {
771
+ "epoch": 16.49122807017544,
772
+ "grad_norm": 0.8118802309036255,
773
+ "learning_rate": 0.00010019493177387915,
774
+ "loss": 0.1091,
775
+ "step": 940
776
+ },
777
+ {
778
+ "epoch": 16.666666666666668,
779
+ "grad_norm": 0.6684471964836121,
780
+ "learning_rate": 9.889538661468485e-05,
781
+ "loss": 0.1323,
782
+ "step": 950
783
+ },
784
+ {
785
+ "epoch": 16.842105263157894,
786
+ "grad_norm": 0.6080668568611145,
787
+ "learning_rate": 9.759584145549058e-05,
788
+ "loss": 0.1168,
789
+ "step": 960
790
+ },
791
+ {
792
+ "epoch": 17.017543859649123,
793
+ "grad_norm": 0.7799493670463562,
794
+ "learning_rate": 9.62962962962963e-05,
795
+ "loss": 0.141,
796
+ "step": 970
797
+ },
798
+ {
799
+ "epoch": 17.19298245614035,
800
+ "grad_norm": 0.5670738816261292,
801
+ "learning_rate": 9.499675113710202e-05,
802
+ "loss": 0.1244,
803
+ "step": 980
804
+ },
805
+ {
806
+ "epoch": 17.36842105263158,
807
+ "grad_norm": 0.9652756452560425,
808
+ "learning_rate": 9.369720597790773e-05,
809
+ "loss": 0.1354,
810
+ "step": 990
811
+ },
812
+ {
813
+ "epoch": 17.54385964912281,
814
+ "grad_norm": 0.8537412881851196,
815
+ "learning_rate": 9.239766081871345e-05,
816
+ "loss": 0.1485,
817
+ "step": 1000
818
+ },
819
+ {
820
+ "epoch": 17.54385964912281,
821
+ "eval_accuracy": 0.9444444444444444,
822
+ "eval_f1": 0.8544819557625145,
823
+ "eval_loss": 0.15389865636825562,
824
+ "eval_precision": 0.8615023474178404,
825
+ "eval_recall": 0.8475750577367206,
826
+ "eval_runtime": 1.7887,
827
+ "eval_samples_per_second": 125.789,
828
+ "eval_steps_per_second": 16.213,
829
+ "step": 1000
830
+ },
831
+ {
832
+ "epoch": 17.719298245614034,
833
+ "grad_norm": 0.9258742928504944,
834
+ "learning_rate": 9.109811565951917e-05,
835
+ "loss": 0.1284,
836
+ "step": 1010
837
+ },
838
+ {
839
+ "epoch": 17.894736842105264,
840
+ "grad_norm": 0.6817509531974792,
841
+ "learning_rate": 8.979857050032489e-05,
842
+ "loss": 0.1226,
843
+ "step": 1020
844
+ },
845
+ {
846
+ "epoch": 18.07017543859649,
847
+ "grad_norm": 0.8437041640281677,
848
+ "learning_rate": 8.849902534113061e-05,
849
+ "loss": 0.1527,
850
+ "step": 1030
851
+ },
852
+ {
853
+ "epoch": 18.24561403508772,
854
+ "grad_norm": 1.2362749576568604,
855
+ "learning_rate": 8.719948018193632e-05,
856
+ "loss": 0.1224,
857
+ "step": 1040
858
+ },
859
+ {
860
+ "epoch": 18.42105263157895,
861
+ "grad_norm": 0.4136218726634979,
862
+ "learning_rate": 8.589993502274204e-05,
863
+ "loss": 0.1293,
864
+ "step": 1050
865
+ },
866
+ {
867
+ "epoch": 18.596491228070175,
868
+ "grad_norm": 0.8913040161132812,
869
+ "learning_rate": 8.460038986354776e-05,
870
+ "loss": 0.1305,
871
+ "step": 1060
872
+ },
873
+ {
874
+ "epoch": 18.771929824561404,
875
+ "grad_norm": 1.0768448114395142,
876
+ "learning_rate": 8.330084470435348e-05,
877
+ "loss": 0.1134,
878
+ "step": 1070
879
+ },
880
+ {
881
+ "epoch": 18.94736842105263,
882
+ "grad_norm": 0.9289010763168335,
883
+ "learning_rate": 8.200129954515919e-05,
884
+ "loss": 0.1551,
885
+ "step": 1080
886
+ },
887
+ {
888
+ "epoch": 19.12280701754386,
889
+ "grad_norm": 0.4481465220451355,
890
+ "learning_rate": 8.070175438596491e-05,
891
+ "loss": 0.1263,
892
+ "step": 1090
893
+ },
894
+ {
895
+ "epoch": 19.29824561403509,
896
+ "grad_norm": 0.7408900260925293,
897
+ "learning_rate": 7.940220922677063e-05,
898
+ "loss": 0.1263,
899
+ "step": 1100
900
+ },
901
+ {
902
+ "epoch": 19.29824561403509,
903
+ "eval_accuracy": 0.944,
904
+ "eval_f1": 0.8534883720930233,
905
+ "eval_loss": 0.15210777521133423,
906
+ "eval_precision": 0.8594847775175644,
907
+ "eval_recall": 0.8475750577367206,
908
+ "eval_runtime": 1.7885,
909
+ "eval_samples_per_second": 125.802,
910
+ "eval_steps_per_second": 16.214,
911
+ "step": 1100
912
+ },
913
+ {
914
+ "epoch": 19.473684210526315,
915
+ "grad_norm": 0.8939012289047241,
916
+ "learning_rate": 7.810266406757635e-05,
917
+ "loss": 0.1206,
918
+ "step": 1110
919
+ },
920
+ {
921
+ "epoch": 19.649122807017545,
922
+ "grad_norm": 0.6809560656547546,
923
+ "learning_rate": 7.680311890838207e-05,
924
+ "loss": 0.1225,
925
+ "step": 1120
926
+ },
927
+ {
928
+ "epoch": 19.82456140350877,
929
+ "grad_norm": 1.1481623649597168,
930
+ "learning_rate": 7.550357374918778e-05,
931
+ "loss": 0.1291,
932
+ "step": 1130
933
+ },
934
+ {
935
+ "epoch": 20.0,
936
+ "grad_norm": 2.0011980533599854,
937
+ "learning_rate": 7.42040285899935e-05,
938
+ "loss": 0.1482,
939
+ "step": 1140
940
+ },
941
+ {
942
+ "epoch": 20.17543859649123,
943
+ "grad_norm": 0.6619019508361816,
944
+ "learning_rate": 7.290448343079922e-05,
945
+ "loss": 0.1123,
946
+ "step": 1150
947
+ },
948
+ {
949
+ "epoch": 20.350877192982455,
950
+ "grad_norm": 0.796700656414032,
951
+ "learning_rate": 7.160493827160494e-05,
952
+ "loss": 0.1166,
953
+ "step": 1160
954
+ },
955
+ {
956
+ "epoch": 20.526315789473685,
957
+ "grad_norm": 0.9634900689125061,
958
+ "learning_rate": 7.030539311241065e-05,
959
+ "loss": 0.1263,
960
+ "step": 1170
961
+ },
962
+ {
963
+ "epoch": 20.70175438596491,
964
+ "grad_norm": 0.505535900592804,
965
+ "learning_rate": 6.900584795321637e-05,
966
+ "loss": 0.1117,
967
+ "step": 1180
968
+ },
969
+ {
970
+ "epoch": 20.87719298245614,
971
+ "grad_norm": 0.5166471600532532,
972
+ "learning_rate": 6.770630279402209e-05,
973
+ "loss": 0.1279,
974
+ "step": 1190
975
+ },
976
+ {
977
+ "epoch": 21.05263157894737,
978
+ "grad_norm": 1.2773476839065552,
979
+ "learning_rate": 6.640675763482781e-05,
980
+ "loss": 0.1444,
981
+ "step": 1200
982
+ },
983
+ {
984
+ "epoch": 21.05263157894737,
985
+ "eval_accuracy": 0.9417777777777778,
986
+ "eval_f1": 0.8471411901983664,
987
+ "eval_loss": 0.155166357755661,
988
+ "eval_precision": 0.8561320754716981,
989
+ "eval_recall": 0.8383371824480369,
990
+ "eval_runtime": 2.37,
991
+ "eval_samples_per_second": 94.937,
992
+ "eval_steps_per_second": 12.236,
993
+ "step": 1200
994
+ },
995
+ {
996
+ "epoch": 21.228070175438596,
997
+ "grad_norm": 0.793021559715271,
998
+ "learning_rate": 6.510721247563352e-05,
999
+ "loss": 0.1168,
1000
+ "step": 1210
1001
+ },
1002
+ {
1003
+ "epoch": 21.403508771929825,
1004
+ "grad_norm": 1.2551689147949219,
1005
+ "learning_rate": 6.380766731643924e-05,
1006
+ "loss": 0.1089,
1007
+ "step": 1220
1008
+ },
1009
+ {
1010
+ "epoch": 21.57894736842105,
1011
+ "grad_norm": 0.6803563237190247,
1012
+ "learning_rate": 6.250812215724496e-05,
1013
+ "loss": 0.1186,
1014
+ "step": 1230
1015
+ },
1016
+ {
1017
+ "epoch": 21.75438596491228,
1018
+ "grad_norm": 1.2632770538330078,
1019
+ "learning_rate": 6.120857699805068e-05,
1020
+ "loss": 0.1116,
1021
+ "step": 1240
1022
+ },
1023
+ {
1024
+ "epoch": 21.92982456140351,
1025
+ "grad_norm": 0.525141716003418,
1026
+ "learning_rate": 5.99090318388564e-05,
1027
+ "loss": 0.0979,
1028
+ "step": 1250
1029
+ },
1030
+ {
1031
+ "epoch": 22.105263157894736,
1032
+ "grad_norm": 0.5942980647087097,
1033
+ "learning_rate": 5.860948667966212e-05,
1034
+ "loss": 0.1483,
1035
+ "step": 1260
1036
+ },
1037
+ {
1038
+ "epoch": 22.280701754385966,
1039
+ "grad_norm": 1.0624207258224487,
1040
+ "learning_rate": 5.7309941520467835e-05,
1041
+ "loss": 0.1155,
1042
+ "step": 1270
1043
+ },
1044
+ {
1045
+ "epoch": 22.45614035087719,
1046
+ "grad_norm": 0.6244792938232422,
1047
+ "learning_rate": 5.6010396361273556e-05,
1048
+ "loss": 0.1159,
1049
+ "step": 1280
1050
+ },
1051
+ {
1052
+ "epoch": 22.63157894736842,
1053
+ "grad_norm": 1.9767743349075317,
1054
+ "learning_rate": 5.471085120207927e-05,
1055
+ "loss": 0.1165,
1056
+ "step": 1290
1057
+ },
1058
+ {
1059
+ "epoch": 22.80701754385965,
1060
+ "grad_norm": 2.270113468170166,
1061
+ "learning_rate": 5.341130604288499e-05,
1062
+ "loss": 0.1133,
1063
+ "step": 1300
1064
+ },
1065
+ {
1066
+ "epoch": 22.80701754385965,
1067
+ "eval_accuracy": 0.9448888888888889,
1068
+ "eval_f1": 0.8561484918793504,
1069
+ "eval_loss": 0.1531468778848648,
1070
+ "eval_precision": 0.8601398601398601,
1071
+ "eval_recall": 0.8521939953810623,
1072
+ "eval_runtime": 4.5112,
1073
+ "eval_samples_per_second": 49.875,
1074
+ "eval_steps_per_second": 6.428,
1075
+ "step": 1300
1076
+ },
1077
+ {
1078
+ "epoch": 22.982456140350877,
1079
+ "grad_norm": 2.3252851963043213,
1080
+ "learning_rate": 5.2111760883690706e-05,
1081
+ "loss": 0.1018,
1082
+ "step": 1310
1083
+ },
1084
+ {
1085
+ "epoch": 23.157894736842106,
1086
+ "grad_norm": 1.3282454013824463,
1087
+ "learning_rate": 5.081221572449643e-05,
1088
+ "loss": 0.1194,
1089
+ "step": 1320
1090
+ },
1091
+ {
1092
+ "epoch": 23.333333333333332,
1093
+ "grad_norm": 0.652642548084259,
1094
+ "learning_rate": 4.951267056530214e-05,
1095
+ "loss": 0.1016,
1096
+ "step": 1330
1097
+ },
1098
+ {
1099
+ "epoch": 23.50877192982456,
1100
+ "grad_norm": 1.584074854850769,
1101
+ "learning_rate": 4.821312540610786e-05,
1102
+ "loss": 0.1109,
1103
+ "step": 1340
1104
+ },
1105
+ {
1106
+ "epoch": 23.68421052631579,
1107
+ "grad_norm": 0.5799722075462341,
1108
+ "learning_rate": 4.691358024691358e-05,
1109
+ "loss": 0.0901,
1110
+ "step": 1350
1111
+ },
1112
+ {
1113
+ "epoch": 23.859649122807017,
1114
+ "grad_norm": 1.9589979648590088,
1115
+ "learning_rate": 4.56140350877193e-05,
1116
+ "loss": 0.1195,
1117
+ "step": 1360
1118
+ },
1119
+ {
1120
+ "epoch": 24.035087719298247,
1121
+ "grad_norm": 0.784710705280304,
1122
+ "learning_rate": 4.431448992852502e-05,
1123
+ "loss": 0.1318,
1124
+ "step": 1370
1125
+ },
1126
+ {
1127
+ "epoch": 24.210526315789473,
1128
+ "grad_norm": 1.0715792179107666,
1129
+ "learning_rate": 4.301494476933073e-05,
1130
+ "loss": 0.1236,
1131
+ "step": 1380
1132
+ },
1133
+ {
1134
+ "epoch": 24.385964912280702,
1135
+ "grad_norm": 0.8761755228042603,
1136
+ "learning_rate": 4.1715399610136454e-05,
1137
+ "loss": 0.1076,
1138
+ "step": 1390
1139
+ },
1140
+ {
1141
+ "epoch": 24.56140350877193,
1142
+ "grad_norm": 0.8874859809875488,
1143
+ "learning_rate": 4.041585445094217e-05,
1144
+ "loss": 0.1019,
1145
+ "step": 1400
1146
+ },
1147
+ {
1148
+ "epoch": 24.56140350877193,
1149
+ "eval_accuracy": 0.9431111111111111,
1150
+ "eval_f1": 0.8490566037735849,
1151
+ "eval_loss": 0.15768744051456451,
1152
+ "eval_precision": 0.8674698795180723,
1153
+ "eval_recall": 0.8314087759815243,
1154
+ "eval_runtime": 1.817,
1155
+ "eval_samples_per_second": 123.828,
1156
+ "eval_steps_per_second": 15.96,
1157
+ "step": 1400
1158
+ },
1159
+ {
1160
+ "epoch": 24.736842105263158,
1161
+ "grad_norm": 0.569615364074707,
1162
+ "learning_rate": 3.911630929174789e-05,
1163
+ "loss": 0.1114,
1164
+ "step": 1410
1165
+ },
1166
+ {
1167
+ "epoch": 24.912280701754387,
1168
+ "grad_norm": 0.4636388123035431,
1169
+ "learning_rate": 3.7816764132553604e-05,
1170
+ "loss": 0.1016,
1171
+ "step": 1420
1172
+ },
1173
+ {
1174
+ "epoch": 25.087719298245613,
1175
+ "grad_norm": 0.7966068983078003,
1176
+ "learning_rate": 3.6517218973359325e-05,
1177
+ "loss": 0.1181,
1178
+ "step": 1430
1179
+ },
1180
+ {
1181
+ "epoch": 25.263157894736842,
1182
+ "grad_norm": 0.7331326603889465,
1183
+ "learning_rate": 3.521767381416504e-05,
1184
+ "loss": 0.1037,
1185
+ "step": 1440
1186
+ },
1187
+ {
1188
+ "epoch": 25.43859649122807,
1189
+ "grad_norm": 1.1376439332962036,
1190
+ "learning_rate": 3.391812865497076e-05,
1191
+ "loss": 0.091,
1192
+ "step": 1450
1193
+ },
1194
+ {
1195
+ "epoch": 25.614035087719298,
1196
+ "grad_norm": 0.43491020798683167,
1197
+ "learning_rate": 3.2618583495776475e-05,
1198
+ "loss": 0.102,
1199
+ "step": 1460
1200
+ },
1201
+ {
1202
+ "epoch": 25.789473684210527,
1203
+ "grad_norm": 0.9410120844841003,
1204
+ "learning_rate": 3.1319038336582196e-05,
1205
+ "loss": 0.1108,
1206
+ "step": 1470
1207
+ },
1208
+ {
1209
+ "epoch": 25.964912280701753,
1210
+ "grad_norm": 0.9321810603141785,
1211
+ "learning_rate": 3.0019493177387914e-05,
1212
+ "loss": 0.1059,
1213
+ "step": 1480
1214
+ },
1215
+ {
1216
+ "epoch": 26.140350877192983,
1217
+ "grad_norm": 0.5571371912956238,
1218
+ "learning_rate": 2.871994801819363e-05,
1219
+ "loss": 0.0926,
1220
+ "step": 1490
1221
+ },
1222
+ {
1223
+ "epoch": 26.31578947368421,
1224
+ "grad_norm": 1.9081007242202759,
1225
+ "learning_rate": 2.742040285899935e-05,
1226
+ "loss": 0.1141,
1227
+ "step": 1500
1228
+ },
1229
+ {
1230
+ "epoch": 26.31578947368421,
1231
+ "eval_accuracy": 0.9413333333333334,
1232
+ "eval_f1": 0.8472222222222222,
1233
+ "eval_loss": 0.15601032972335815,
1234
+ "eval_precision": 0.8491879350348028,
1235
+ "eval_recall": 0.8452655889145496,
1236
+ "eval_runtime": 1.867,
1237
+ "eval_samples_per_second": 120.511,
1238
+ "eval_steps_per_second": 15.533,
1239
+ "step": 1500
1240
+ },
1241
+ {
1242
+ "epoch": 26.49122807017544,
1243
+ "grad_norm": 0.8356673121452332,
1244
+ "learning_rate": 2.6120857699805067e-05,
1245
+ "loss": 0.1077,
1246
+ "step": 1510
1247
+ },
1248
+ {
1249
+ "epoch": 26.666666666666668,
1250
+ "grad_norm": 1.3644295930862427,
1251
+ "learning_rate": 2.4821312540610784e-05,
1252
+ "loss": 0.1212,
1253
+ "step": 1520
1254
+ },
1255
+ {
1256
+ "epoch": 26.842105263157894,
1257
+ "grad_norm": 0.779222309589386,
1258
+ "learning_rate": 2.3521767381416506e-05,
1259
+ "loss": 0.1229,
1260
+ "step": 1530
1261
+ },
1262
+ {
1263
+ "epoch": 27.017543859649123,
1264
+ "grad_norm": 0.5873481631278992,
1265
+ "learning_rate": 2.2222222222222223e-05,
1266
+ "loss": 0.0998,
1267
+ "step": 1540
1268
+ },
1269
+ {
1270
+ "epoch": 27.19298245614035,
1271
+ "grad_norm": 0.9948704242706299,
1272
+ "learning_rate": 2.092267706302794e-05,
1273
+ "loss": 0.1435,
1274
+ "step": 1550
1275
+ },
1276
+ {
1277
+ "epoch": 27.36842105263158,
1278
+ "grad_norm": 0.32820120453834534,
1279
+ "learning_rate": 1.962313190383366e-05,
1280
+ "loss": 0.0992,
1281
+ "step": 1560
1282
+ },
1283
+ {
1284
+ "epoch": 27.54385964912281,
1285
+ "grad_norm": 1.0797744989395142,
1286
+ "learning_rate": 1.8323586744639376e-05,
1287
+ "loss": 0.1095,
1288
+ "step": 1570
1289
+ },
1290
+ {
1291
+ "epoch": 27.719298245614034,
1292
+ "grad_norm": 1.5036197900772095,
1293
+ "learning_rate": 1.7024041585445094e-05,
1294
+ "loss": 0.119,
1295
+ "step": 1580
1296
+ },
1297
+ {
1298
+ "epoch": 27.894736842105264,
1299
+ "grad_norm": 1.0871007442474365,
1300
+ "learning_rate": 1.5724496426250812e-05,
1301
+ "loss": 0.0974,
1302
+ "step": 1590
1303
+ },
1304
+ {
1305
+ "epoch": 28.07017543859649,
1306
+ "grad_norm": 0.6861986517906189,
1307
+ "learning_rate": 1.442495126705653e-05,
1308
+ "loss": 0.1087,
1309
+ "step": 1600
1310
+ },
1311
+ {
1312
+ "epoch": 28.07017543859649,
1313
+ "eval_accuracy": 0.9422222222222222,
1314
+ "eval_f1": 0.8491879350348028,
1315
+ "eval_loss": 0.15734025835990906,
1316
+ "eval_precision": 0.8531468531468531,
1317
+ "eval_recall": 0.8452655889145496,
1318
+ "eval_runtime": 3.5904,
1319
+ "eval_samples_per_second": 62.668,
1320
+ "eval_steps_per_second": 8.077,
1321
+ "step": 1600
1322
+ },
1323
+ {
1324
+ "epoch": 28.24561403508772,
1325
+ "grad_norm": 1.5399742126464844,
1326
+ "learning_rate": 1.3125406107862247e-05,
1327
+ "loss": 0.1243,
1328
+ "step": 1610
1329
+ },
1330
+ {
1331
+ "epoch": 28.42105263157895,
1332
+ "grad_norm": 0.7721771001815796,
1333
+ "learning_rate": 1.1825860948667967e-05,
1334
+ "loss": 0.0965,
1335
+ "step": 1620
1336
+ },
1337
+ {
1338
+ "epoch": 28.596491228070175,
1339
+ "grad_norm": 1.040131688117981,
1340
+ "learning_rate": 1.0526315789473684e-05,
1341
+ "loss": 0.1133,
1342
+ "step": 1630
1343
+ },
1344
+ {
1345
+ "epoch": 28.771929824561404,
1346
+ "grad_norm": 0.9755656123161316,
1347
+ "learning_rate": 9.226770630279402e-06,
1348
+ "loss": 0.0885,
1349
+ "step": 1640
1350
+ },
1351
+ {
1352
+ "epoch": 28.94736842105263,
1353
+ "grad_norm": 0.5838367342948914,
1354
+ "learning_rate": 7.92722547108512e-06,
1355
+ "loss": 0.1134,
1356
+ "step": 1650
1357
+ },
1358
+ {
1359
+ "epoch": 29.12280701754386,
1360
+ "grad_norm": 1.698116421699524,
1361
+ "learning_rate": 6.6276803118908384e-06,
1362
+ "loss": 0.1278,
1363
+ "step": 1660
1364
+ },
1365
+ {
1366
+ "epoch": 29.29824561403509,
1367
+ "grad_norm": 0.581572413444519,
1368
+ "learning_rate": 5.328135152696556e-06,
1369
+ "loss": 0.1209,
1370
+ "step": 1670
1371
+ },
1372
+ {
1373
+ "epoch": 29.473684210526315,
1374
+ "grad_norm": 0.4100797772407532,
1375
+ "learning_rate": 4.028589993502274e-06,
1376
+ "loss": 0.1108,
1377
+ "step": 1680
1378
+ },
1379
+ {
1380
+ "epoch": 29.649122807017545,
1381
+ "grad_norm": 1.5013538599014282,
1382
+ "learning_rate": 2.729044834307992e-06,
1383
+ "loss": 0.1195,
1384
+ "step": 1690
1385
+ },
1386
+ {
1387
+ "epoch": 29.82456140350877,
1388
+ "grad_norm": 1.0121512413024902,
1389
+ "learning_rate": 1.4294996751137102e-06,
1390
+ "loss": 0.1015,
1391
+ "step": 1700
1392
+ },
1393
+ {
1394
+ "epoch": 29.82456140350877,
1395
+ "eval_accuracy": 0.9422222222222222,
1396
+ "eval_f1": 0.8488372093023255,
1397
+ "eval_loss": 0.15452326834201813,
1398
+ "eval_precision": 0.8548009367681498,
1399
+ "eval_recall": 0.8429561200923787,
1400
+ "eval_runtime": 1.8193,
1401
+ "eval_samples_per_second": 123.672,
1402
+ "eval_steps_per_second": 15.94,
1403
+ "step": 1700
1404
+ },
1405
+ {
1406
+ "epoch": 30.0,
1407
+ "grad_norm": 2.770343780517578,
1408
+ "learning_rate": 1.299545159194282e-07,
1409
+ "loss": 0.1342,
1410
+ "step": 1710
1411
+ },
1412
+ {
1413
+ "epoch": 30.0,
1414
+ "step": 1710,
1415
+ "total_flos": 1.77124415883264e+17,
1416
+ "train_loss": 0.20865077226482637,
1417
+ "train_runtime": 373.9101,
1418
+ "train_samples_per_second": 72.21,
1419
+ "train_steps_per_second": 4.573
1420
+ }
1421
+ ],
1422
+ "logging_steps": 10,
1423
+ "max_steps": 1710,
1424
+ "num_input_tokens_seen": 0,
1425
+ "num_train_epochs": 30,
1426
+ "save_steps": 500,
1427
+ "stateful_callbacks": {
1428
+ "TrainerControl": {
1429
+ "args": {
1430
+ "should_epoch_stop": false,
1431
+ "should_evaluate": false,
1432
+ "should_log": false,
1433
+ "should_save": true,
1434
+ "should_training_stop": true
1435
+ },
1436
+ "attributes": {}
1437
+ }
1438
+ },
1439
+ "total_flos": 1.77124415883264e+17,
1440
+ "train_batch_size": 16,
1441
+ "trial_name": null,
1442
+ "trial_params": null
1443
+ }