selmamalak commited on
Commit
59d1004
1 Parent(s): 37285c2

End of training

Browse files
Files changed (5) hide show
  1. README.md +5 -5
  2. all_results.json +16 -0
  3. eval_results.json +11 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1459 -0
README.md CHANGED
@@ -23,11 +23,11 @@ should probably proofread and complete it, then remove this comment. -->
23
 
24
  This model is a fine-tuned version of [microsoft/swin-large-patch4-window7-224-in22k](https://huggingface.co/microsoft/swin-large-patch4-window7-224-in22k) on the medmnist-v2 dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 0.0775
27
- - Accuracy: 0.9714
28
- - Precision: 0.9690
29
- - Recall: 0.9699
30
- - F1: 0.9692
31
 
32
  ## Model description
33
 
 
23
 
24
  This model is a fine-tuned version of [microsoft/swin-large-patch4-window7-224-in22k](https://huggingface.co/microsoft/swin-large-patch4-window7-224-in22k) on the medmnist-v2 dataset.
25
  It achieves the following results on the evaluation set:
26
+ - Loss: 0.1036
27
+ - Accuracy: 0.9649
28
+ - Precision: 0.9627
29
+ - Recall: 0.9616
30
+ - F1: 0.9619
31
 
32
  ## Model description
33
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9649225372698041,
4
+ "eval_f1": 0.9618809511152965,
5
+ "eval_loss": 0.10363399982452393,
6
+ "eval_precision": 0.9626584889347929,
7
+ "eval_recall": 0.9615716482526544,
8
+ "eval_runtime": 39.7158,
9
+ "eval_samples_per_second": 86.137,
10
+ "eval_steps_per_second": 5.388,
11
+ "total_flos": 2.1188849626596557e+19,
12
+ "train_loss": 0.34750947773775315,
13
+ "train_runtime": 3122.2212,
14
+ "train_samples_per_second": 38.303,
15
+ "train_steps_per_second": 0.599
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9649225372698041,
4
+ "eval_f1": 0.9618809511152965,
5
+ "eval_loss": 0.10363399982452393,
6
+ "eval_precision": 0.9626584889347929,
7
+ "eval_recall": 0.9615716482526544,
8
+ "eval_runtime": 39.7158,
9
+ "eval_samples_per_second": 86.137,
10
+ "eval_steps_per_second": 5.388
11
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 2.1188849626596557e+19,
4
+ "train_loss": 0.34750947773775315,
5
+ "train_runtime": 3122.2212,
6
+ "train_samples_per_second": 38.303,
7
+ "train_steps_per_second": 0.599
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1459 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9713785046728972,
3
+ "best_model_checkpoint": "swin-large-patch4-window7-224-in22k-finetuned-lora-medmnistv2/checkpoint-1870",
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1870,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.05,
13
+ "grad_norm": 2.163198709487915,
14
+ "learning_rate": 0.004973262032085562,
15
+ "loss": 1.5101,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.11,
20
+ "grad_norm": 1.4286870956420898,
21
+ "learning_rate": 0.004946524064171123,
22
+ "loss": 0.8667,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.16,
27
+ "grad_norm": 1.847931981086731,
28
+ "learning_rate": 0.004919786096256685,
29
+ "loss": 0.7414,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.21,
34
+ "grad_norm": 1.5748757123947144,
35
+ "learning_rate": 0.004893048128342246,
36
+ "loss": 0.755,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.27,
41
+ "grad_norm": 2.017432928085327,
42
+ "learning_rate": 0.004866310160427808,
43
+ "loss": 0.6683,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.32,
48
+ "grad_norm": 1.5988194942474365,
49
+ "learning_rate": 0.004839572192513369,
50
+ "loss": 0.7084,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.37,
55
+ "grad_norm": 1.7127466201782227,
56
+ "learning_rate": 0.004812834224598931,
57
+ "loss": 0.6459,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.43,
62
+ "grad_norm": 2.1388797760009766,
63
+ "learning_rate": 0.004786096256684492,
64
+ "loss": 0.7116,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.48,
69
+ "grad_norm": 2.5939793586730957,
70
+ "learning_rate": 0.004759358288770054,
71
+ "loss": 0.5753,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.53,
76
+ "grad_norm": 1.463460087776184,
77
+ "learning_rate": 0.004732620320855615,
78
+ "loss": 0.5938,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.59,
83
+ "grad_norm": 1.9902774095535278,
84
+ "learning_rate": 0.004705882352941177,
85
+ "loss": 0.5525,
86
+ "step": 110
87
+ },
88
+ {
89
+ "epoch": 0.64,
90
+ "grad_norm": 1.881441593170166,
91
+ "learning_rate": 0.004679144385026738,
92
+ "loss": 0.5788,
93
+ "step": 120
94
+ },
95
+ {
96
+ "epoch": 0.7,
97
+ "grad_norm": 2.161348581314087,
98
+ "learning_rate": 0.0046524064171123,
99
+ "loss": 0.5378,
100
+ "step": 130
101
+ },
102
+ {
103
+ "epoch": 0.75,
104
+ "grad_norm": 1.5160846710205078,
105
+ "learning_rate": 0.0046256684491978615,
106
+ "loss": 0.479,
107
+ "step": 140
108
+ },
109
+ {
110
+ "epoch": 0.8,
111
+ "grad_norm": 1.4215080738067627,
112
+ "learning_rate": 0.004598930481283423,
113
+ "loss": 0.5123,
114
+ "step": 150
115
+ },
116
+ {
117
+ "epoch": 0.86,
118
+ "grad_norm": 1.2568920850753784,
119
+ "learning_rate": 0.004572192513368984,
120
+ "loss": 0.5499,
121
+ "step": 160
122
+ },
123
+ {
124
+ "epoch": 0.91,
125
+ "grad_norm": 0.9570059180259705,
126
+ "learning_rate": 0.004545454545454545,
127
+ "loss": 0.4845,
128
+ "step": 170
129
+ },
130
+ {
131
+ "epoch": 0.96,
132
+ "grad_norm": 2.3021810054779053,
133
+ "learning_rate": 0.004518716577540107,
134
+ "loss": 0.5141,
135
+ "step": 180
136
+ },
137
+ {
138
+ "epoch": 1.0,
139
+ "eval_accuracy": 0.9065420560747663,
140
+ "eval_f1": 0.8872552707825272,
141
+ "eval_loss": 0.2832600176334381,
142
+ "eval_precision": 0.8954019032094056,
143
+ "eval_recall": 0.8949326095168356,
144
+ "eval_runtime": 19.6658,
145
+ "eval_samples_per_second": 87.054,
146
+ "eval_steps_per_second": 5.441,
147
+ "step": 187
148
+ },
149
+ {
150
+ "epoch": 1.02,
151
+ "grad_norm": 1.8552567958831787,
152
+ "learning_rate": 0.004491978609625669,
153
+ "loss": 0.4441,
154
+ "step": 190
155
+ },
156
+ {
157
+ "epoch": 1.07,
158
+ "grad_norm": 2.53872013092041,
159
+ "learning_rate": 0.00446524064171123,
160
+ "loss": 0.4436,
161
+ "step": 200
162
+ },
163
+ {
164
+ "epoch": 1.12,
165
+ "grad_norm": 1.3826777935028076,
166
+ "learning_rate": 0.004438502673796791,
167
+ "loss": 0.4632,
168
+ "step": 210
169
+ },
170
+ {
171
+ "epoch": 1.18,
172
+ "grad_norm": 2.2216227054595947,
173
+ "learning_rate": 0.004411764705882353,
174
+ "loss": 0.4429,
175
+ "step": 220
176
+ },
177
+ {
178
+ "epoch": 1.23,
179
+ "grad_norm": 1.8521422147750854,
180
+ "learning_rate": 0.004385026737967914,
181
+ "loss": 0.4472,
182
+ "step": 230
183
+ },
184
+ {
185
+ "epoch": 1.28,
186
+ "grad_norm": 2.058058977127075,
187
+ "learning_rate": 0.00436096256684492,
188
+ "loss": 0.4757,
189
+ "step": 240
190
+ },
191
+ {
192
+ "epoch": 1.34,
193
+ "grad_norm": 1.1437183618545532,
194
+ "learning_rate": 0.004334224598930481,
195
+ "loss": 0.3436,
196
+ "step": 250
197
+ },
198
+ {
199
+ "epoch": 1.39,
200
+ "grad_norm": 1.761400580406189,
201
+ "learning_rate": 0.0043074866310160425,
202
+ "loss": 0.4958,
203
+ "step": 260
204
+ },
205
+ {
206
+ "epoch": 1.44,
207
+ "grad_norm": 1.4134129285812378,
208
+ "learning_rate": 0.004280748663101605,
209
+ "loss": 0.4519,
210
+ "step": 270
211
+ },
212
+ {
213
+ "epoch": 1.5,
214
+ "grad_norm": 1.7341545820236206,
215
+ "learning_rate": 0.004254010695187166,
216
+ "loss": 0.528,
217
+ "step": 280
218
+ },
219
+ {
220
+ "epoch": 1.55,
221
+ "grad_norm": 2.980020761489868,
222
+ "learning_rate": 0.004227272727272727,
223
+ "loss": 0.5021,
224
+ "step": 290
225
+ },
226
+ {
227
+ "epoch": 1.6,
228
+ "grad_norm": 0.6755030751228333,
229
+ "learning_rate": 0.004200534759358289,
230
+ "loss": 0.4601,
231
+ "step": 300
232
+ },
233
+ {
234
+ "epoch": 1.66,
235
+ "grad_norm": 1.8686202764511108,
236
+ "learning_rate": 0.00417379679144385,
237
+ "loss": 0.4433,
238
+ "step": 310
239
+ },
240
+ {
241
+ "epoch": 1.71,
242
+ "grad_norm": 1.371077299118042,
243
+ "learning_rate": 0.004147058823529412,
244
+ "loss": 0.4323,
245
+ "step": 320
246
+ },
247
+ {
248
+ "epoch": 1.76,
249
+ "grad_norm": 1.0771093368530273,
250
+ "learning_rate": 0.004120320855614973,
251
+ "loss": 0.4251,
252
+ "step": 330
253
+ },
254
+ {
255
+ "epoch": 1.82,
256
+ "grad_norm": 1.185023546218872,
257
+ "learning_rate": 0.004093582887700535,
258
+ "loss": 0.4881,
259
+ "step": 340
260
+ },
261
+ {
262
+ "epoch": 1.87,
263
+ "grad_norm": 0.9843281507492065,
264
+ "learning_rate": 0.004066844919786096,
265
+ "loss": 0.4483,
266
+ "step": 350
267
+ },
268
+ {
269
+ "epoch": 1.93,
270
+ "grad_norm": 1.6477869749069214,
271
+ "learning_rate": 0.004040106951871658,
272
+ "loss": 0.4956,
273
+ "step": 360
274
+ },
275
+ {
276
+ "epoch": 1.98,
277
+ "grad_norm": 1.7044633626937866,
278
+ "learning_rate": 0.004013368983957219,
279
+ "loss": 0.4176,
280
+ "step": 370
281
+ },
282
+ {
283
+ "epoch": 2.0,
284
+ "eval_accuracy": 0.9310747663551402,
285
+ "eval_f1": 0.9182291375322846,
286
+ "eval_loss": 0.198581263422966,
287
+ "eval_precision": 0.924344363515161,
288
+ "eval_recall": 0.9209393532374213,
289
+ "eval_runtime": 19.6732,
290
+ "eval_samples_per_second": 87.022,
291
+ "eval_steps_per_second": 5.439,
292
+ "step": 374
293
+ },
294
+ {
295
+ "epoch": 2.03,
296
+ "grad_norm": 1.662027359008789,
297
+ "learning_rate": 0.003986631016042781,
298
+ "loss": 0.4022,
299
+ "step": 380
300
+ },
301
+ {
302
+ "epoch": 2.09,
303
+ "grad_norm": 1.188351035118103,
304
+ "learning_rate": 0.003959893048128342,
305
+ "loss": 0.3758,
306
+ "step": 390
307
+ },
308
+ {
309
+ "epoch": 2.14,
310
+ "grad_norm": 2.2225048542022705,
311
+ "learning_rate": 0.003933155080213904,
312
+ "loss": 0.4491,
313
+ "step": 400
314
+ },
315
+ {
316
+ "epoch": 2.19,
317
+ "grad_norm": 1.683356761932373,
318
+ "learning_rate": 0.0039064171122994654,
319
+ "loss": 0.3647,
320
+ "step": 410
321
+ },
322
+ {
323
+ "epoch": 2.25,
324
+ "grad_norm": 1.7646687030792236,
325
+ "learning_rate": 0.0038796791443850265,
326
+ "loss": 0.4666,
327
+ "step": 420
328
+ },
329
+ {
330
+ "epoch": 2.3,
331
+ "grad_norm": 2.173644781112671,
332
+ "learning_rate": 0.0038529411764705885,
333
+ "loss": 0.4314,
334
+ "step": 430
335
+ },
336
+ {
337
+ "epoch": 2.35,
338
+ "grad_norm": 0.8064551949501038,
339
+ "learning_rate": 0.00382620320855615,
340
+ "loss": 0.3944,
341
+ "step": 440
342
+ },
343
+ {
344
+ "epoch": 2.41,
345
+ "grad_norm": 0.9698677062988281,
346
+ "learning_rate": 0.003799465240641711,
347
+ "loss": 0.4314,
348
+ "step": 450
349
+ },
350
+ {
351
+ "epoch": 2.46,
352
+ "grad_norm": 0.9321346879005432,
353
+ "learning_rate": 0.0037727272727272726,
354
+ "loss": 0.467,
355
+ "step": 460
356
+ },
357
+ {
358
+ "epoch": 2.51,
359
+ "grad_norm": 2.6592769622802734,
360
+ "learning_rate": 0.003745989304812834,
361
+ "loss": 0.4024,
362
+ "step": 470
363
+ },
364
+ {
365
+ "epoch": 2.57,
366
+ "grad_norm": 1.7124016284942627,
367
+ "learning_rate": 0.003719251336898396,
368
+ "loss": 0.3283,
369
+ "step": 480
370
+ },
371
+ {
372
+ "epoch": 2.62,
373
+ "grad_norm": 3.178034543991089,
374
+ "learning_rate": 0.0036925133689839572,
375
+ "loss": 0.4377,
376
+ "step": 490
377
+ },
378
+ {
379
+ "epoch": 2.67,
380
+ "grad_norm": 1.2681751251220703,
381
+ "learning_rate": 0.0036657754010695188,
382
+ "loss": 0.3866,
383
+ "step": 500
384
+ },
385
+ {
386
+ "epoch": 2.73,
387
+ "grad_norm": 1.1923668384552002,
388
+ "learning_rate": 0.0036390374331550803,
389
+ "loss": 0.3366,
390
+ "step": 510
391
+ },
392
+ {
393
+ "epoch": 2.78,
394
+ "grad_norm": 1.499803066253662,
395
+ "learning_rate": 0.0036122994652406414,
396
+ "loss": 0.4578,
397
+ "step": 520
398
+ },
399
+ {
400
+ "epoch": 2.83,
401
+ "grad_norm": 1.887222409248352,
402
+ "learning_rate": 0.0035855614973262034,
403
+ "loss": 0.4189,
404
+ "step": 530
405
+ },
406
+ {
407
+ "epoch": 2.89,
408
+ "grad_norm": 1.3592134714126587,
409
+ "learning_rate": 0.003558823529411765,
410
+ "loss": 0.4008,
411
+ "step": 540
412
+ },
413
+ {
414
+ "epoch": 2.94,
415
+ "grad_norm": 3.0257527828216553,
416
+ "learning_rate": 0.0035320855614973264,
417
+ "loss": 0.3774,
418
+ "step": 550
419
+ },
420
+ {
421
+ "epoch": 2.99,
422
+ "grad_norm": 1.093493103981018,
423
+ "learning_rate": 0.0035053475935828875,
424
+ "loss": 0.3454,
425
+ "step": 560
426
+ },
427
+ {
428
+ "epoch": 3.0,
429
+ "eval_accuracy": 0.9503504672897196,
430
+ "eval_f1": 0.9402807880914787,
431
+ "eval_loss": 0.15674300491809845,
432
+ "eval_precision": 0.9426615409260363,
433
+ "eval_recall": 0.9397047025483766,
434
+ "eval_runtime": 19.5644,
435
+ "eval_samples_per_second": 87.506,
436
+ "eval_steps_per_second": 5.469,
437
+ "step": 561
438
+ },
439
+ {
440
+ "epoch": 3.05,
441
+ "grad_norm": 1.7053543329238892,
442
+ "learning_rate": 0.003478609625668449,
443
+ "loss": 0.3776,
444
+ "step": 570
445
+ },
446
+ {
447
+ "epoch": 3.1,
448
+ "grad_norm": 1.5041882991790771,
449
+ "learning_rate": 0.003451871657754011,
450
+ "loss": 0.4058,
451
+ "step": 580
452
+ },
453
+ {
454
+ "epoch": 3.16,
455
+ "grad_norm": 1.3619967699050903,
456
+ "learning_rate": 0.0034251336898395725,
457
+ "loss": 0.3646,
458
+ "step": 590
459
+ },
460
+ {
461
+ "epoch": 3.21,
462
+ "grad_norm": 1.1415998935699463,
463
+ "learning_rate": 0.0033983957219251336,
464
+ "loss": 0.4906,
465
+ "step": 600
466
+ },
467
+ {
468
+ "epoch": 3.26,
469
+ "grad_norm": 1.6870795488357544,
470
+ "learning_rate": 0.003371657754010695,
471
+ "loss": 0.3828,
472
+ "step": 610
473
+ },
474
+ {
475
+ "epoch": 3.32,
476
+ "grad_norm": 1.0538561344146729,
477
+ "learning_rate": 0.0033449197860962567,
478
+ "loss": 0.3728,
479
+ "step": 620
480
+ },
481
+ {
482
+ "epoch": 3.37,
483
+ "grad_norm": 2.340454339981079,
484
+ "learning_rate": 0.0033181818181818186,
485
+ "loss": 0.3809,
486
+ "step": 630
487
+ },
488
+ {
489
+ "epoch": 3.42,
490
+ "grad_norm": 2.317230224609375,
491
+ "learning_rate": 0.0032914438502673797,
492
+ "loss": 0.3391,
493
+ "step": 640
494
+ },
495
+ {
496
+ "epoch": 3.48,
497
+ "grad_norm": 1.242281436920166,
498
+ "learning_rate": 0.0032647058823529413,
499
+ "loss": 0.4091,
500
+ "step": 650
501
+ },
502
+ {
503
+ "epoch": 3.53,
504
+ "grad_norm": 1.23116934299469,
505
+ "learning_rate": 0.003237967914438503,
506
+ "loss": 0.3592,
507
+ "step": 660
508
+ },
509
+ {
510
+ "epoch": 3.58,
511
+ "grad_norm": 1.117090106010437,
512
+ "learning_rate": 0.003211229946524064,
513
+ "loss": 0.3867,
514
+ "step": 670
515
+ },
516
+ {
517
+ "epoch": 3.64,
518
+ "grad_norm": 1.0917716026306152,
519
+ "learning_rate": 0.0031844919786096254,
520
+ "loss": 0.4386,
521
+ "step": 680
522
+ },
523
+ {
524
+ "epoch": 3.69,
525
+ "grad_norm": 1.2080508470535278,
526
+ "learning_rate": 0.0031577540106951874,
527
+ "loss": 0.3466,
528
+ "step": 690
529
+ },
530
+ {
531
+ "epoch": 3.74,
532
+ "grad_norm": 1.695580244064331,
533
+ "learning_rate": 0.003131016042780749,
534
+ "loss": 0.3147,
535
+ "step": 700
536
+ },
537
+ {
538
+ "epoch": 3.8,
539
+ "grad_norm": 1.1604491472244263,
540
+ "learning_rate": 0.00310427807486631,
541
+ "loss": 0.3585,
542
+ "step": 710
543
+ },
544
+ {
545
+ "epoch": 3.85,
546
+ "grad_norm": 1.8931636810302734,
547
+ "learning_rate": 0.0030775401069518715,
548
+ "loss": 0.3578,
549
+ "step": 720
550
+ },
551
+ {
552
+ "epoch": 3.9,
553
+ "grad_norm": 1.4620869159698486,
554
+ "learning_rate": 0.003050802139037433,
555
+ "loss": 0.3522,
556
+ "step": 730
557
+ },
558
+ {
559
+ "epoch": 3.96,
560
+ "grad_norm": 1.3944414854049683,
561
+ "learning_rate": 0.003024064171122995,
562
+ "loss": 0.3228,
563
+ "step": 740
564
+ },
565
+ {
566
+ "epoch": 4.0,
567
+ "eval_accuracy": 0.9357476635514018,
568
+ "eval_f1": 0.9283408808159395,
569
+ "eval_loss": 0.1848856657743454,
570
+ "eval_precision": 0.9231661406901872,
571
+ "eval_recall": 0.9426484043891363,
572
+ "eval_runtime": 19.683,
573
+ "eval_samples_per_second": 86.979,
574
+ "eval_steps_per_second": 5.436,
575
+ "step": 748
576
+ },
577
+ {
578
+ "epoch": 4.01,
579
+ "grad_norm": 1.7614619731903076,
580
+ "learning_rate": 0.002997326203208556,
581
+ "loss": 0.3463,
582
+ "step": 750
583
+ },
584
+ {
585
+ "epoch": 4.06,
586
+ "grad_norm": 2.866691827774048,
587
+ "learning_rate": 0.0029705882352941177,
588
+ "loss": 0.3431,
589
+ "step": 760
590
+ },
591
+ {
592
+ "epoch": 4.12,
593
+ "grad_norm": 3.0871615409851074,
594
+ "learning_rate": 0.002943850267379679,
595
+ "loss": 0.4329,
596
+ "step": 770
597
+ },
598
+ {
599
+ "epoch": 4.17,
600
+ "grad_norm": 1.3399722576141357,
601
+ "learning_rate": 0.0029171122994652403,
602
+ "loss": 0.3992,
603
+ "step": 780
604
+ },
605
+ {
606
+ "epoch": 4.22,
607
+ "grad_norm": 1.440559983253479,
608
+ "learning_rate": 0.0028903743315508022,
609
+ "loss": 0.3333,
610
+ "step": 790
611
+ },
612
+ {
613
+ "epoch": 4.28,
614
+ "grad_norm": 1.4606270790100098,
615
+ "learning_rate": 0.0028636363636363638,
616
+ "loss": 0.3108,
617
+ "step": 800
618
+ },
619
+ {
620
+ "epoch": 4.33,
621
+ "grad_norm": 2.4641544818878174,
622
+ "learning_rate": 0.0028368983957219253,
623
+ "loss": 0.3436,
624
+ "step": 810
625
+ },
626
+ {
627
+ "epoch": 4.39,
628
+ "grad_norm": 1.9653208255767822,
629
+ "learning_rate": 0.0028101604278074864,
630
+ "loss": 0.2766,
631
+ "step": 820
632
+ },
633
+ {
634
+ "epoch": 4.44,
635
+ "grad_norm": 1.0840091705322266,
636
+ "learning_rate": 0.002783422459893048,
637
+ "loss": 0.2568,
638
+ "step": 830
639
+ },
640
+ {
641
+ "epoch": 4.49,
642
+ "grad_norm": 1.0625332593917847,
643
+ "learning_rate": 0.00275668449197861,
644
+ "loss": 0.3366,
645
+ "step": 840
646
+ },
647
+ {
648
+ "epoch": 4.55,
649
+ "grad_norm": 0.9171143174171448,
650
+ "learning_rate": 0.0027299465240641714,
651
+ "loss": 0.339,
652
+ "step": 850
653
+ },
654
+ {
655
+ "epoch": 4.6,
656
+ "grad_norm": 1.6296868324279785,
657
+ "learning_rate": 0.0027032085561497325,
658
+ "loss": 0.359,
659
+ "step": 860
660
+ },
661
+ {
662
+ "epoch": 4.65,
663
+ "grad_norm": 1.949312448501587,
664
+ "learning_rate": 0.002676470588235294,
665
+ "loss": 0.3529,
666
+ "step": 870
667
+ },
668
+ {
669
+ "epoch": 4.71,
670
+ "grad_norm": 1.6241270303726196,
671
+ "learning_rate": 0.0026497326203208556,
672
+ "loss": 0.3364,
673
+ "step": 880
674
+ },
675
+ {
676
+ "epoch": 4.76,
677
+ "grad_norm": 2.172145366668701,
678
+ "learning_rate": 0.0026229946524064175,
679
+ "loss": 0.3374,
680
+ "step": 890
681
+ },
682
+ {
683
+ "epoch": 4.81,
684
+ "grad_norm": 3.377912998199463,
685
+ "learning_rate": 0.0025962566844919786,
686
+ "loss": 0.3555,
687
+ "step": 900
688
+ },
689
+ {
690
+ "epoch": 4.87,
691
+ "grad_norm": 1.194082260131836,
692
+ "learning_rate": 0.00256951871657754,
693
+ "loss": 0.3354,
694
+ "step": 910
695
+ },
696
+ {
697
+ "epoch": 4.92,
698
+ "grad_norm": 1.774932861328125,
699
+ "learning_rate": 0.0025427807486631017,
700
+ "loss": 0.3728,
701
+ "step": 920
702
+ },
703
+ {
704
+ "epoch": 4.97,
705
+ "grad_norm": 0.9065486192703247,
706
+ "learning_rate": 0.002516042780748663,
707
+ "loss": 0.3382,
708
+ "step": 930
709
+ },
710
+ {
711
+ "epoch": 5.0,
712
+ "eval_accuracy": 0.9398364485981309,
713
+ "eval_f1": 0.9320964504504674,
714
+ "eval_loss": 0.16266803443431854,
715
+ "eval_precision": 0.9301560138584124,
716
+ "eval_recall": 0.9396981551324834,
717
+ "eval_runtime": 19.6531,
718
+ "eval_samples_per_second": 87.111,
719
+ "eval_steps_per_second": 5.444,
720
+ "step": 935
721
+ },
722
+ {
723
+ "epoch": 5.03,
724
+ "grad_norm": 0.8373203873634338,
725
+ "learning_rate": 0.0024893048128342248,
726
+ "loss": 0.3115,
727
+ "step": 940
728
+ },
729
+ {
730
+ "epoch": 5.08,
731
+ "grad_norm": 1.6470876932144165,
732
+ "learning_rate": 0.002462566844919786,
733
+ "loss": 0.3746,
734
+ "step": 950
735
+ },
736
+ {
737
+ "epoch": 5.13,
738
+ "grad_norm": 2.556999444961548,
739
+ "learning_rate": 0.002435828877005348,
740
+ "loss": 0.3411,
741
+ "step": 960
742
+ },
743
+ {
744
+ "epoch": 5.19,
745
+ "grad_norm": 1.753217101097107,
746
+ "learning_rate": 0.002409090909090909,
747
+ "loss": 0.3095,
748
+ "step": 970
749
+ },
750
+ {
751
+ "epoch": 5.24,
752
+ "grad_norm": 2.667759895324707,
753
+ "learning_rate": 0.0023823529411764704,
754
+ "loss": 0.3358,
755
+ "step": 980
756
+ },
757
+ {
758
+ "epoch": 5.29,
759
+ "grad_norm": 1.6711212396621704,
760
+ "learning_rate": 0.002355614973262032,
761
+ "loss": 0.3263,
762
+ "step": 990
763
+ },
764
+ {
765
+ "epoch": 5.35,
766
+ "grad_norm": 1.8793816566467285,
767
+ "learning_rate": 0.0023288770053475935,
768
+ "loss": 0.3245,
769
+ "step": 1000
770
+ },
771
+ {
772
+ "epoch": 5.4,
773
+ "grad_norm": 1.3059521913528442,
774
+ "learning_rate": 0.002302139037433155,
775
+ "loss": 0.2904,
776
+ "step": 1010
777
+ },
778
+ {
779
+ "epoch": 5.45,
780
+ "grad_norm": 1.765958309173584,
781
+ "learning_rate": 0.0022754010695187166,
782
+ "loss": 0.3424,
783
+ "step": 1020
784
+ },
785
+ {
786
+ "epoch": 5.51,
787
+ "grad_norm": 0.9322473406791687,
788
+ "learning_rate": 0.002248663101604278,
789
+ "loss": 0.3716,
790
+ "step": 1030
791
+ },
792
+ {
793
+ "epoch": 5.56,
794
+ "grad_norm": 2.082515239715576,
795
+ "learning_rate": 0.0022219251336898396,
796
+ "loss": 0.2967,
797
+ "step": 1040
798
+ },
799
+ {
800
+ "epoch": 5.61,
801
+ "grad_norm": 1.6903836727142334,
802
+ "learning_rate": 0.002195187165775401,
803
+ "loss": 0.3244,
804
+ "step": 1050
805
+ },
806
+ {
807
+ "epoch": 5.67,
808
+ "grad_norm": 1.1631466150283813,
809
+ "learning_rate": 0.0021684491978609627,
810
+ "loss": 0.3141,
811
+ "step": 1060
812
+ },
813
+ {
814
+ "epoch": 5.72,
815
+ "grad_norm": 2.086376428604126,
816
+ "learning_rate": 0.002141711229946524,
817
+ "loss": 0.3211,
818
+ "step": 1070
819
+ },
820
+ {
821
+ "epoch": 5.78,
822
+ "grad_norm": 1.709187626838684,
823
+ "learning_rate": 0.0021149732620320857,
824
+ "loss": 0.3039,
825
+ "step": 1080
826
+ },
827
+ {
828
+ "epoch": 5.83,
829
+ "grad_norm": 1.7365305423736572,
830
+ "learning_rate": 0.0020882352941176473,
831
+ "loss": 0.2937,
832
+ "step": 1090
833
+ },
834
+ {
835
+ "epoch": 5.88,
836
+ "grad_norm": 1.2648741006851196,
837
+ "learning_rate": 0.0020614973262032084,
838
+ "loss": 0.2951,
839
+ "step": 1100
840
+ },
841
+ {
842
+ "epoch": 5.94,
843
+ "grad_norm": 1.2121895551681519,
844
+ "learning_rate": 0.00203475935828877,
845
+ "loss": 0.242,
846
+ "step": 1110
847
+ },
848
+ {
849
+ "epoch": 5.99,
850
+ "grad_norm": 1.6397563219070435,
851
+ "learning_rate": 0.0020080213903743314,
852
+ "loss": 0.3363,
853
+ "step": 1120
854
+ },
855
+ {
856
+ "epoch": 6.0,
857
+ "eval_accuracy": 0.9509345794392523,
858
+ "eval_f1": 0.9456282248184136,
859
+ "eval_loss": 0.14138737320899963,
860
+ "eval_precision": 0.9497944760971885,
861
+ "eval_recall": 0.9441674024122191,
862
+ "eval_runtime": 19.937,
863
+ "eval_samples_per_second": 85.87,
864
+ "eval_steps_per_second": 5.367,
865
+ "step": 1122
866
+ },
867
+ {
868
+ "epoch": 6.04,
869
+ "grad_norm": 1.3460767269134521,
870
+ "learning_rate": 0.001981283422459893,
871
+ "loss": 0.3134,
872
+ "step": 1130
873
+ },
874
+ {
875
+ "epoch": 6.1,
876
+ "grad_norm": 1.2124683856964111,
877
+ "learning_rate": 0.0019545454545454545,
878
+ "loss": 0.3028,
879
+ "step": 1140
880
+ },
881
+ {
882
+ "epoch": 6.15,
883
+ "grad_norm": 0.8806934952735901,
884
+ "learning_rate": 0.001927807486631016,
885
+ "loss": 0.2589,
886
+ "step": 1150
887
+ },
888
+ {
889
+ "epoch": 6.2,
890
+ "grad_norm": 1.059187889099121,
891
+ "learning_rate": 0.0019010695187165775,
892
+ "loss": 0.2888,
893
+ "step": 1160
894
+ },
895
+ {
896
+ "epoch": 6.26,
897
+ "grad_norm": 2.5121827125549316,
898
+ "learning_rate": 0.001874331550802139,
899
+ "loss": 0.2741,
900
+ "step": 1170
901
+ },
902
+ {
903
+ "epoch": 6.31,
904
+ "grad_norm": 1.0052329301834106,
905
+ "learning_rate": 0.0018475935828877006,
906
+ "loss": 0.3519,
907
+ "step": 1180
908
+ },
909
+ {
910
+ "epoch": 6.36,
911
+ "grad_norm": 1.4301072359085083,
912
+ "learning_rate": 0.0018208556149732621,
913
+ "loss": 0.2937,
914
+ "step": 1190
915
+ },
916
+ {
917
+ "epoch": 6.42,
918
+ "grad_norm": 1.09031343460083,
919
+ "learning_rate": 0.0017941176470588236,
920
+ "loss": 0.2252,
921
+ "step": 1200
922
+ },
923
+ {
924
+ "epoch": 6.47,
925
+ "grad_norm": 1.9657083749771118,
926
+ "learning_rate": 0.001767379679144385,
927
+ "loss": 0.267,
928
+ "step": 1210
929
+ },
930
+ {
931
+ "epoch": 6.52,
932
+ "grad_norm": 3.7427196502685547,
933
+ "learning_rate": 0.0017406417112299467,
934
+ "loss": 0.2493,
935
+ "step": 1220
936
+ },
937
+ {
938
+ "epoch": 6.58,
939
+ "grad_norm": 1.7291096448898315,
940
+ "learning_rate": 0.001713903743315508,
941
+ "loss": 0.2558,
942
+ "step": 1230
943
+ },
944
+ {
945
+ "epoch": 6.63,
946
+ "grad_norm": 2.8834567070007324,
947
+ "learning_rate": 0.0016871657754010698,
948
+ "loss": 0.3167,
949
+ "step": 1240
950
+ },
951
+ {
952
+ "epoch": 6.68,
953
+ "grad_norm": 1.6702009439468384,
954
+ "learning_rate": 0.001660427807486631,
955
+ "loss": 0.274,
956
+ "step": 1250
957
+ },
958
+ {
959
+ "epoch": 6.74,
960
+ "grad_norm": 1.7623697519302368,
961
+ "learning_rate": 0.0016336898395721924,
962
+ "loss": 0.2481,
963
+ "step": 1260
964
+ },
965
+ {
966
+ "epoch": 6.79,
967
+ "grad_norm": 1.8855972290039062,
968
+ "learning_rate": 0.0016069518716577541,
969
+ "loss": 0.2424,
970
+ "step": 1270
971
+ },
972
+ {
973
+ "epoch": 6.84,
974
+ "grad_norm": 1.7909148931503296,
975
+ "learning_rate": 0.0015802139037433154,
976
+ "loss": 0.2361,
977
+ "step": 1280
978
+ },
979
+ {
980
+ "epoch": 6.9,
981
+ "grad_norm": 1.424047589302063,
982
+ "learning_rate": 0.001553475935828877,
983
+ "loss": 0.2834,
984
+ "step": 1290
985
+ },
986
+ {
987
+ "epoch": 6.95,
988
+ "grad_norm": 1.3470966815948486,
989
+ "learning_rate": 0.0015267379679144385,
990
+ "loss": 0.2981,
991
+ "step": 1300
992
+ },
993
+ {
994
+ "epoch": 7.0,
995
+ "eval_accuracy": 0.9544392523364486,
996
+ "eval_f1": 0.9480272544883982,
997
+ "eval_loss": 0.11172817647457123,
998
+ "eval_precision": 0.9458066711610336,
999
+ "eval_recall": 0.9541586489707353,
1000
+ "eval_runtime": 19.6986,
1001
+ "eval_samples_per_second": 86.91,
1002
+ "eval_steps_per_second": 5.432,
1003
+ "step": 1309
1004
+ },
1005
+ {
1006
+ "epoch": 7.01,
1007
+ "grad_norm": 1.9716545343399048,
1008
+ "learning_rate": 0.0015,
1009
+ "loss": 0.2591,
1010
+ "step": 1310
1011
+ },
1012
+ {
1013
+ "epoch": 7.06,
1014
+ "grad_norm": 2.347787618637085,
1015
+ "learning_rate": 0.0014732620320855616,
1016
+ "loss": 0.2324,
1017
+ "step": 1320
1018
+ },
1019
+ {
1020
+ "epoch": 7.11,
1021
+ "grad_norm": 1.5514649152755737,
1022
+ "learning_rate": 0.001446524064171123,
1023
+ "loss": 0.2163,
1024
+ "step": 1330
1025
+ },
1026
+ {
1027
+ "epoch": 7.17,
1028
+ "grad_norm": 3.073544979095459,
1029
+ "learning_rate": 0.0014197860962566844,
1030
+ "loss": 0.2889,
1031
+ "step": 1340
1032
+ },
1033
+ {
1034
+ "epoch": 7.22,
1035
+ "grad_norm": 1.5972115993499756,
1036
+ "learning_rate": 0.0013930481283422461,
1037
+ "loss": 0.2589,
1038
+ "step": 1350
1039
+ },
1040
+ {
1041
+ "epoch": 7.27,
1042
+ "grad_norm": 1.8408401012420654,
1043
+ "learning_rate": 0.0013663101604278075,
1044
+ "loss": 0.2333,
1045
+ "step": 1360
1046
+ },
1047
+ {
1048
+ "epoch": 7.33,
1049
+ "grad_norm": 1.3704335689544678,
1050
+ "learning_rate": 0.0013395721925133692,
1051
+ "loss": 0.2103,
1052
+ "step": 1370
1053
+ },
1054
+ {
1055
+ "epoch": 7.38,
1056
+ "grad_norm": 3.6621859073638916,
1057
+ "learning_rate": 0.0013128342245989305,
1058
+ "loss": 0.2413,
1059
+ "step": 1380
1060
+ },
1061
+ {
1062
+ "epoch": 7.43,
1063
+ "grad_norm": 1.345258355140686,
1064
+ "learning_rate": 0.0012860962566844918,
1065
+ "loss": 0.2444,
1066
+ "step": 1390
1067
+ },
1068
+ {
1069
+ "epoch": 7.49,
1070
+ "grad_norm": 1.354202389717102,
1071
+ "learning_rate": 0.0012593582887700536,
1072
+ "loss": 0.2288,
1073
+ "step": 1400
1074
+ },
1075
+ {
1076
+ "epoch": 7.54,
1077
+ "grad_norm": 0.983450174331665,
1078
+ "learning_rate": 0.0012326203208556149,
1079
+ "loss": 0.2995,
1080
+ "step": 1410
1081
+ },
1082
+ {
1083
+ "epoch": 7.59,
1084
+ "grad_norm": 1.7251689434051514,
1085
+ "learning_rate": 0.0012058823529411764,
1086
+ "loss": 0.2898,
1087
+ "step": 1420
1088
+ },
1089
+ {
1090
+ "epoch": 7.65,
1091
+ "grad_norm": 1.4366217851638794,
1092
+ "learning_rate": 0.001179144385026738,
1093
+ "loss": 0.2509,
1094
+ "step": 1430
1095
+ },
1096
+ {
1097
+ "epoch": 7.7,
1098
+ "grad_norm": 1.6491020917892456,
1099
+ "learning_rate": 0.0011524064171122995,
1100
+ "loss": 0.2191,
1101
+ "step": 1440
1102
+ },
1103
+ {
1104
+ "epoch": 7.75,
1105
+ "grad_norm": 1.4462454319000244,
1106
+ "learning_rate": 0.001125668449197861,
1107
+ "loss": 0.2307,
1108
+ "step": 1450
1109
+ },
1110
+ {
1111
+ "epoch": 7.81,
1112
+ "grad_norm": 1.5503740310668945,
1113
+ "learning_rate": 0.0010989304812834225,
1114
+ "loss": 0.2167,
1115
+ "step": 1460
1116
+ },
1117
+ {
1118
+ "epoch": 7.86,
1119
+ "grad_norm": 1.5065810680389404,
1120
+ "learning_rate": 0.001072192513368984,
1121
+ "loss": 0.3377,
1122
+ "step": 1470
1123
+ },
1124
+ {
1125
+ "epoch": 7.91,
1126
+ "grad_norm": 1.3696374893188477,
1127
+ "learning_rate": 0.0010454545454545454,
1128
+ "loss": 0.24,
1129
+ "step": 1480
1130
+ },
1131
+ {
1132
+ "epoch": 7.97,
1133
+ "grad_norm": 0.9576804041862488,
1134
+ "learning_rate": 0.001018716577540107,
1135
+ "loss": 0.2214,
1136
+ "step": 1490
1137
+ },
1138
+ {
1139
+ "epoch": 8.0,
1140
+ "eval_accuracy": 0.9649532710280374,
1141
+ "eval_f1": 0.9609815836403053,
1142
+ "eval_loss": 0.11309263855218887,
1143
+ "eval_precision": 0.9642473014777337,
1144
+ "eval_recall": 0.9584474051621633,
1145
+ "eval_runtime": 19.6442,
1146
+ "eval_samples_per_second": 87.15,
1147
+ "eval_steps_per_second": 5.447,
1148
+ "step": 1496
1149
+ },
1150
+ {
1151
+ "epoch": 8.02,
1152
+ "grad_norm": 1.6305640935897827,
1153
+ "learning_rate": 0.0009919786096256684,
1154
+ "loss": 0.2645,
1155
+ "step": 1500
1156
+ },
1157
+ {
1158
+ "epoch": 8.07,
1159
+ "grad_norm": 1.0711798667907715,
1160
+ "learning_rate": 0.00096524064171123,
1161
+ "loss": 0.2063,
1162
+ "step": 1510
1163
+ },
1164
+ {
1165
+ "epoch": 8.13,
1166
+ "grad_norm": 1.2606171369552612,
1167
+ "learning_rate": 0.0009385026737967915,
1168
+ "loss": 0.1904,
1169
+ "step": 1520
1170
+ },
1171
+ {
1172
+ "epoch": 8.18,
1173
+ "grad_norm": 0.8554580807685852,
1174
+ "learning_rate": 0.0009117647058823529,
1175
+ "loss": 0.2078,
1176
+ "step": 1530
1177
+ },
1178
+ {
1179
+ "epoch": 8.24,
1180
+ "grad_norm": 1.0638494491577148,
1181
+ "learning_rate": 0.0008850267379679144,
1182
+ "loss": 0.2129,
1183
+ "step": 1540
1184
+ },
1185
+ {
1186
+ "epoch": 8.29,
1187
+ "grad_norm": 1.4322021007537842,
1188
+ "learning_rate": 0.000858288770053476,
1189
+ "loss": 0.2761,
1190
+ "step": 1550
1191
+ },
1192
+ {
1193
+ "epoch": 8.34,
1194
+ "grad_norm": 1.2639697790145874,
1195
+ "learning_rate": 0.0008315508021390375,
1196
+ "loss": 0.1979,
1197
+ "step": 1560
1198
+ },
1199
+ {
1200
+ "epoch": 8.4,
1201
+ "grad_norm": 1.108430027961731,
1202
+ "learning_rate": 0.0008048128342245989,
1203
+ "loss": 0.2051,
1204
+ "step": 1570
1205
+ },
1206
+ {
1207
+ "epoch": 8.45,
1208
+ "grad_norm": 2.08953857421875,
1209
+ "learning_rate": 0.0007780748663101605,
1210
+ "loss": 0.2306,
1211
+ "step": 1580
1212
+ },
1213
+ {
1214
+ "epoch": 8.5,
1215
+ "grad_norm": 1.464694857597351,
1216
+ "learning_rate": 0.000751336898395722,
1217
+ "loss": 0.1992,
1218
+ "step": 1590
1219
+ },
1220
+ {
1221
+ "epoch": 8.56,
1222
+ "grad_norm": 1.4773173332214355,
1223
+ "learning_rate": 0.0007245989304812835,
1224
+ "loss": 0.1764,
1225
+ "step": 1600
1226
+ },
1227
+ {
1228
+ "epoch": 8.61,
1229
+ "grad_norm": 2.048029661178589,
1230
+ "learning_rate": 0.000697860962566845,
1231
+ "loss": 0.237,
1232
+ "step": 1610
1233
+ },
1234
+ {
1235
+ "epoch": 8.66,
1236
+ "grad_norm": 1.0951212644577026,
1237
+ "learning_rate": 0.0006711229946524064,
1238
+ "loss": 0.1821,
1239
+ "step": 1620
1240
+ },
1241
+ {
1242
+ "epoch": 8.72,
1243
+ "grad_norm": 1.084712028503418,
1244
+ "learning_rate": 0.0006443850267379679,
1245
+ "loss": 0.1947,
1246
+ "step": 1630
1247
+ },
1248
+ {
1249
+ "epoch": 8.77,
1250
+ "grad_norm": 1.007285714149475,
1251
+ "learning_rate": 0.0006176470588235294,
1252
+ "loss": 0.2014,
1253
+ "step": 1640
1254
+ },
1255
+ {
1256
+ "epoch": 8.82,
1257
+ "grad_norm": 1.0643844604492188,
1258
+ "learning_rate": 0.0005909090909090909,
1259
+ "loss": 0.2411,
1260
+ "step": 1650
1261
+ },
1262
+ {
1263
+ "epoch": 8.88,
1264
+ "grad_norm": 2.0171964168548584,
1265
+ "learning_rate": 0.0005641711229946525,
1266
+ "loss": 0.2297,
1267
+ "step": 1660
1268
+ },
1269
+ {
1270
+ "epoch": 8.93,
1271
+ "grad_norm": 0.8814995884895325,
1272
+ "learning_rate": 0.0005374331550802139,
1273
+ "loss": 0.2052,
1274
+ "step": 1670
1275
+ },
1276
+ {
1277
+ "epoch": 8.98,
1278
+ "grad_norm": 1.338088035583496,
1279
+ "learning_rate": 0.0005106951871657754,
1280
+ "loss": 0.1928,
1281
+ "step": 1680
1282
+ },
1283
+ {
1284
+ "epoch": 9.0,
1285
+ "eval_accuracy": 0.9649532710280374,
1286
+ "eval_f1": 0.9624133353031232,
1287
+ "eval_loss": 0.09664417803287506,
1288
+ "eval_precision": 0.9632215980141733,
1289
+ "eval_recall": 0.9628324486352646,
1290
+ "eval_runtime": 19.7505,
1291
+ "eval_samples_per_second": 86.681,
1292
+ "eval_steps_per_second": 5.418,
1293
+ "step": 1683
1294
+ },
1295
+ {
1296
+ "epoch": 9.04,
1297
+ "grad_norm": 1.1753814220428467,
1298
+ "learning_rate": 0.0004839572192513369,
1299
+ "loss": 0.1862,
1300
+ "step": 1690
1301
+ },
1302
+ {
1303
+ "epoch": 9.09,
1304
+ "grad_norm": 0.9707505702972412,
1305
+ "learning_rate": 0.0004572192513368984,
1306
+ "loss": 0.2182,
1307
+ "step": 1700
1308
+ },
1309
+ {
1310
+ "epoch": 9.14,
1311
+ "grad_norm": 0.9967671632766724,
1312
+ "learning_rate": 0.0004304812834224599,
1313
+ "loss": 0.1923,
1314
+ "step": 1710
1315
+ },
1316
+ {
1317
+ "epoch": 9.2,
1318
+ "grad_norm": 1.496031641960144,
1319
+ "learning_rate": 0.00040374331550802143,
1320
+ "loss": 0.2105,
1321
+ "step": 1720
1322
+ },
1323
+ {
1324
+ "epoch": 9.25,
1325
+ "grad_norm": 0.8774816393852234,
1326
+ "learning_rate": 0.00037700534759358285,
1327
+ "loss": 0.1969,
1328
+ "step": 1730
1329
+ },
1330
+ {
1331
+ "epoch": 9.3,
1332
+ "grad_norm": 0.6063610315322876,
1333
+ "learning_rate": 0.0003502673796791444,
1334
+ "loss": 0.1577,
1335
+ "step": 1740
1336
+ },
1337
+ {
1338
+ "epoch": 9.36,
1339
+ "grad_norm": 0.8216743469238281,
1340
+ "learning_rate": 0.0003235294117647059,
1341
+ "loss": 0.2064,
1342
+ "step": 1750
1343
+ },
1344
+ {
1345
+ "epoch": 9.41,
1346
+ "grad_norm": 0.7338688373565674,
1347
+ "learning_rate": 0.0002967914438502674,
1348
+ "loss": 0.1793,
1349
+ "step": 1760
1350
+ },
1351
+ {
1352
+ "epoch": 9.47,
1353
+ "grad_norm": 0.910650372505188,
1354
+ "learning_rate": 0.00027005347593582886,
1355
+ "loss": 0.194,
1356
+ "step": 1770
1357
+ },
1358
+ {
1359
+ "epoch": 9.52,
1360
+ "grad_norm": 0.7778304219245911,
1361
+ "learning_rate": 0.00024331550802139036,
1362
+ "loss": 0.2203,
1363
+ "step": 1780
1364
+ },
1365
+ {
1366
+ "epoch": 9.57,
1367
+ "grad_norm": 1.0693227052688599,
1368
+ "learning_rate": 0.00021657754010695186,
1369
+ "loss": 0.1718,
1370
+ "step": 1790
1371
+ },
1372
+ {
1373
+ "epoch": 9.63,
1374
+ "grad_norm": 1.4808011054992676,
1375
+ "learning_rate": 0.0001898395721925134,
1376
+ "loss": 0.1696,
1377
+ "step": 1800
1378
+ },
1379
+ {
1380
+ "epoch": 9.68,
1381
+ "grad_norm": 0.8625634908676147,
1382
+ "learning_rate": 0.0001631016042780749,
1383
+ "loss": 0.1875,
1384
+ "step": 1810
1385
+ },
1386
+ {
1387
+ "epoch": 9.73,
1388
+ "grad_norm": 1.1236218214035034,
1389
+ "learning_rate": 0.00013636363636363637,
1390
+ "loss": 0.1772,
1391
+ "step": 1820
1392
+ },
1393
+ {
1394
+ "epoch": 9.79,
1395
+ "grad_norm": 1.027061939239502,
1396
+ "learning_rate": 0.00010962566844919787,
1397
+ "loss": 0.2274,
1398
+ "step": 1830
1399
+ },
1400
+ {
1401
+ "epoch": 9.84,
1402
+ "grad_norm": 0.977976381778717,
1403
+ "learning_rate": 8.288770053475936e-05,
1404
+ "loss": 0.1672,
1405
+ "step": 1840
1406
+ },
1407
+ {
1408
+ "epoch": 9.89,
1409
+ "grad_norm": 0.957969069480896,
1410
+ "learning_rate": 5.614973262032086e-05,
1411
+ "loss": 0.1966,
1412
+ "step": 1850
1413
+ },
1414
+ {
1415
+ "epoch": 9.95,
1416
+ "grad_norm": 0.6182002425193787,
1417
+ "learning_rate": 2.9411764705882354e-05,
1418
+ "loss": 0.1546,
1419
+ "step": 1860
1420
+ },
1421
+ {
1422
+ "epoch": 10.0,
1423
+ "grad_norm": 1.8023217916488647,
1424
+ "learning_rate": 2.6737967914438504e-06,
1425
+ "loss": 0.1901,
1426
+ "step": 1870
1427
+ },
1428
+ {
1429
+ "epoch": 10.0,
1430
+ "eval_accuracy": 0.9713785046728972,
1431
+ "eval_f1": 0.9692014832223894,
1432
+ "eval_loss": 0.07747028768062592,
1433
+ "eval_precision": 0.968992240300534,
1434
+ "eval_recall": 0.9698888041231651,
1435
+ "eval_runtime": 19.6225,
1436
+ "eval_samples_per_second": 87.247,
1437
+ "eval_steps_per_second": 5.453,
1438
+ "step": 1870
1439
+ },
1440
+ {
1441
+ "epoch": 10.0,
1442
+ "step": 1870,
1443
+ "total_flos": 2.1188849626596557e+19,
1444
+ "train_loss": 0.34750947773775315,
1445
+ "train_runtime": 3122.2212,
1446
+ "train_samples_per_second": 38.303,
1447
+ "train_steps_per_second": 0.599
1448
+ }
1449
+ ],
1450
+ "logging_steps": 10,
1451
+ "max_steps": 1870,
1452
+ "num_input_tokens_seen": 0,
1453
+ "num_train_epochs": 10,
1454
+ "save_steps": 500,
1455
+ "total_flos": 2.1188849626596557e+19,
1456
+ "train_batch_size": 16,
1457
+ "trial_name": null,
1458
+ "trial_params": null
1459
+ }