th041 commited on
Commit
d2d9a44
·
verified ·
1 Parent(s): fb31a3b

🍻 cheers

Browse files
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224-in21k
4
  tags:
 
5
  - generated_from_trainer
6
  datasets:
7
  - imagefolder
@@ -22,7 +23,7 @@ model-index:
22
  metrics:
23
  - name: Accuracy
24
  type: accuracy
25
- value: 0.7899543378995434
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -32,8 +33,8 @@ should probably proofread and complete it, then remove this comment. -->
32
 
33
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset.
34
  It achieves the following results on the evaluation set:
35
- - Loss: 1.2067
36
- - Accuracy: 0.7900
37
 
38
  ## Model description
39
 
 
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224-in21k
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  datasets:
8
  - imagefolder
 
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
+ value: 0.6894977168949772
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
33
 
34
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset.
35
  It achieves the following results on the evaluation set:
36
+ - Loss: 0.7966
37
+ - Accuracy: 0.6895
38
 
39
  ## Model description
40
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 18.0,
3
+ "eval_accuracy": 0.6894977168949772,
4
+ "eval_loss": 0.7965957522392273,
5
+ "eval_runtime": 4.8253,
6
+ "eval_samples_per_second": 45.386,
7
+ "eval_steps_per_second": 5.803,
8
+ "total_flos": 2.739521370098516e+18,
9
+ "train_loss": 0.17171474754101115,
10
+ "train_runtime": 1017.2571,
11
+ "train_samples_per_second": 34.752,
12
+ "train_steps_per_second": 2.176
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 18.0,
3
+ "eval_accuracy": 0.6894977168949772,
4
+ "eval_loss": 0.7965957522392273,
5
+ "eval_runtime": 4.8253,
6
+ "eval_samples_per_second": 45.386,
7
+ "eval_steps_per_second": 5.803
8
+ }
runs/May28_19-14-16_0846ebbfb3df/events.out.tfevents.1716925053.0846ebbfb3df.486.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:824e7c2f08cdeb8177e1a298fb320369004df32998ae657e31a2ff840684f5ec
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 18.0,
3
+ "total_flos": 2.739521370098516e+18,
4
+ "train_loss": 0.17171474754101115,
5
+ "train_runtime": 1017.2571,
6
+ "train_samples_per_second": 34.752,
7
+ "train_steps_per_second": 2.176
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1787 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.7965957522392273,
3
+ "best_model_checkpoint": "vit-weld-classify/checkpoint-100",
4
+ "epoch": 18.0,
5
+ "eval_steps": 100,
6
+ "global_step": 2214,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.08130081300813008,
13
+ "grad_norm": 0.4229675233364105,
14
+ "learning_rate": 0.000199096657633243,
15
+ "loss": 1.1212,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.16260162601626016,
20
+ "grad_norm": 1.1081550121307373,
21
+ "learning_rate": 0.000198193315266486,
22
+ "loss": 1.0661,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.24390243902439024,
27
+ "grad_norm": 3.017395496368408,
28
+ "learning_rate": 0.000197289972899729,
29
+ "loss": 1.0507,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.3252032520325203,
34
+ "grad_norm": 4.183485984802246,
35
+ "learning_rate": 0.00019638663053297203,
36
+ "loss": 1.0825,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.4065040650406504,
41
+ "grad_norm": 1.9094836711883545,
42
+ "learning_rate": 0.000195483288166215,
43
+ "loss": 0.9802,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.4878048780487805,
48
+ "grad_norm": 1.73495352268219,
49
+ "learning_rate": 0.000194579945799458,
50
+ "loss": 0.9875,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.5691056910569106,
55
+ "grad_norm": 2.0932652950286865,
56
+ "learning_rate": 0.000193676603432701,
57
+ "loss": 0.9526,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.6504065040650406,
62
+ "grad_norm": 2.3296732902526855,
63
+ "learning_rate": 0.000192773261065944,
64
+ "loss": 0.8794,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.7317073170731707,
69
+ "grad_norm": 1.1494122743606567,
70
+ "learning_rate": 0.000191869918699187,
71
+ "loss": 0.9485,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.8130081300813008,
76
+ "grad_norm": 1.6882137060165405,
77
+ "learning_rate": 0.00019096657633243,
78
+ "loss": 0.8686,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.8130081300813008,
83
+ "eval_accuracy": 0.6894977168949772,
84
+ "eval_loss": 0.7965957522392273,
85
+ "eval_runtime": 1.897,
86
+ "eval_samples_per_second": 115.448,
87
+ "eval_steps_per_second": 14.76,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.8943089430894309,
92
+ "grad_norm": 5.094268321990967,
93
+ "learning_rate": 0.000190063233965673,
94
+ "loss": 0.8394,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.975609756097561,
99
+ "grad_norm": 2.215106248855591,
100
+ "learning_rate": 0.000189159891598916,
101
+ "loss": 0.7727,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 1.056910569105691,
106
+ "grad_norm": 5.558369159698486,
107
+ "learning_rate": 0.000188256549232159,
108
+ "loss": 0.7774,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 1.1382113821138211,
113
+ "grad_norm": 1.4229400157928467,
114
+ "learning_rate": 0.000187353206865402,
115
+ "loss": 0.8096,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 1.2195121951219512,
120
+ "grad_norm": 4.707171440124512,
121
+ "learning_rate": 0.000186449864498645,
122
+ "loss": 0.816,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 1.3008130081300813,
127
+ "grad_norm": 1.5261428356170654,
128
+ "learning_rate": 0.000185546522131888,
129
+ "loss": 0.6121,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 1.3821138211382114,
134
+ "grad_norm": 3.1494133472442627,
135
+ "learning_rate": 0.000184643179765131,
136
+ "loss": 0.78,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 1.4634146341463414,
141
+ "grad_norm": 7.678445816040039,
142
+ "learning_rate": 0.000183739837398374,
143
+ "loss": 0.6962,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 1.5447154471544715,
148
+ "grad_norm": 2.71274471282959,
149
+ "learning_rate": 0.00018283649503161699,
150
+ "loss": 0.7503,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 1.6260162601626016,
155
+ "grad_norm": 3.5023741722106934,
156
+ "learning_rate": 0.00018193315266485998,
157
+ "loss": 0.6935,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 1.6260162601626016,
162
+ "eval_accuracy": 0.5068493150684932,
163
+ "eval_loss": 1.2217025756835938,
164
+ "eval_runtime": 2.1551,
165
+ "eval_samples_per_second": 101.619,
166
+ "eval_steps_per_second": 12.992,
167
+ "step": 200
168
+ },
169
+ {
170
+ "epoch": 1.7073170731707317,
171
+ "grad_norm": 3.2445621490478516,
172
+ "learning_rate": 0.000181029810298103,
173
+ "loss": 0.5929,
174
+ "step": 210
175
+ },
176
+ {
177
+ "epoch": 1.7886178861788617,
178
+ "grad_norm": 2.444040060043335,
179
+ "learning_rate": 0.000180126467931346,
180
+ "loss": 0.6668,
181
+ "step": 220
182
+ },
183
+ {
184
+ "epoch": 1.8699186991869918,
185
+ "grad_norm": 2.8826639652252197,
186
+ "learning_rate": 0.00017922312556458897,
187
+ "loss": 0.6119,
188
+ "step": 230
189
+ },
190
+ {
191
+ "epoch": 1.951219512195122,
192
+ "grad_norm": 1.261606216430664,
193
+ "learning_rate": 0.00017831978319783197,
194
+ "loss": 0.6275,
195
+ "step": 240
196
+ },
197
+ {
198
+ "epoch": 2.032520325203252,
199
+ "grad_norm": 4.046661376953125,
200
+ "learning_rate": 0.00017741644083107497,
201
+ "loss": 0.5644,
202
+ "step": 250
203
+ },
204
+ {
205
+ "epoch": 2.113821138211382,
206
+ "grad_norm": 2.8075666427612305,
207
+ "learning_rate": 0.000176513098464318,
208
+ "loss": 0.4775,
209
+ "step": 260
210
+ },
211
+ {
212
+ "epoch": 2.1951219512195124,
213
+ "grad_norm": 11.482375144958496,
214
+ "learning_rate": 0.000175609756097561,
215
+ "loss": 0.6515,
216
+ "step": 270
217
+ },
218
+ {
219
+ "epoch": 2.2764227642276422,
220
+ "grad_norm": 7.770543098449707,
221
+ "learning_rate": 0.000174706413730804,
222
+ "loss": 0.4004,
223
+ "step": 280
224
+ },
225
+ {
226
+ "epoch": 2.3577235772357725,
227
+ "grad_norm": 2.5070407390594482,
228
+ "learning_rate": 0.00017380307136404699,
229
+ "loss": 0.3705,
230
+ "step": 290
231
+ },
232
+ {
233
+ "epoch": 2.4390243902439024,
234
+ "grad_norm": 2.954922914505005,
235
+ "learning_rate": 0.00017289972899728998,
236
+ "loss": 0.4225,
237
+ "step": 300
238
+ },
239
+ {
240
+ "epoch": 2.4390243902439024,
241
+ "eval_accuracy": 0.6210045662100456,
242
+ "eval_loss": 0.9592322707176208,
243
+ "eval_runtime": 2.2647,
244
+ "eval_samples_per_second": 96.703,
245
+ "eval_steps_per_second": 12.364,
246
+ "step": 300
247
+ },
248
+ {
249
+ "epoch": 2.5203252032520327,
250
+ "grad_norm": 2.0386438369750977,
251
+ "learning_rate": 0.00017199638663053298,
252
+ "loss": 0.4874,
253
+ "step": 310
254
+ },
255
+ {
256
+ "epoch": 2.6016260162601625,
257
+ "grad_norm": 7.5740180015563965,
258
+ "learning_rate": 0.00017109304426377598,
259
+ "loss": 0.5005,
260
+ "step": 320
261
+ },
262
+ {
263
+ "epoch": 2.682926829268293,
264
+ "grad_norm": 4.629195690155029,
265
+ "learning_rate": 0.00017018970189701897,
266
+ "loss": 0.4861,
267
+ "step": 330
268
+ },
269
+ {
270
+ "epoch": 2.7642276422764227,
271
+ "grad_norm": 5.8368072509765625,
272
+ "learning_rate": 0.00016928635953026197,
273
+ "loss": 0.5013,
274
+ "step": 340
275
+ },
276
+ {
277
+ "epoch": 2.845528455284553,
278
+ "grad_norm": 2.3338427543640137,
279
+ "learning_rate": 0.00016838301716350497,
280
+ "loss": 0.53,
281
+ "step": 350
282
+ },
283
+ {
284
+ "epoch": 2.926829268292683,
285
+ "grad_norm": 4.834186553955078,
286
+ "learning_rate": 0.00016747967479674797,
287
+ "loss": 0.4889,
288
+ "step": 360
289
+ },
290
+ {
291
+ "epoch": 3.008130081300813,
292
+ "grad_norm": 1.1826306581497192,
293
+ "learning_rate": 0.000166576332429991,
294
+ "loss": 0.4451,
295
+ "step": 370
296
+ },
297
+ {
298
+ "epoch": 3.089430894308943,
299
+ "grad_norm": 2.3115830421447754,
300
+ "learning_rate": 0.000165672990063234,
301
+ "loss": 0.4455,
302
+ "step": 380
303
+ },
304
+ {
305
+ "epoch": 3.1707317073170733,
306
+ "grad_norm": 2.7050156593322754,
307
+ "learning_rate": 0.00016476964769647699,
308
+ "loss": 0.1422,
309
+ "step": 390
310
+ },
311
+ {
312
+ "epoch": 3.252032520325203,
313
+ "grad_norm": 0.68967604637146,
314
+ "learning_rate": 0.00016386630532971998,
315
+ "loss": 0.2586,
316
+ "step": 400
317
+ },
318
+ {
319
+ "epoch": 3.252032520325203,
320
+ "eval_accuracy": 0.593607305936073,
321
+ "eval_loss": 1.312296986579895,
322
+ "eval_runtime": 1.8615,
323
+ "eval_samples_per_second": 117.649,
324
+ "eval_steps_per_second": 15.042,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 3.3333333333333335,
329
+ "grad_norm": 8.18450927734375,
330
+ "learning_rate": 0.00016296296296296295,
331
+ "loss": 0.5364,
332
+ "step": 410
333
+ },
334
+ {
335
+ "epoch": 3.4146341463414633,
336
+ "grad_norm": 2.960862874984741,
337
+ "learning_rate": 0.00016205962059620595,
338
+ "loss": 0.2547,
339
+ "step": 420
340
+ },
341
+ {
342
+ "epoch": 3.4959349593495936,
343
+ "grad_norm": 0.5109916925430298,
344
+ "learning_rate": 0.00016115627822944897,
345
+ "loss": 0.2994,
346
+ "step": 430
347
+ },
348
+ {
349
+ "epoch": 3.5772357723577235,
350
+ "grad_norm": 3.0849831104278564,
351
+ "learning_rate": 0.00016025293586269197,
352
+ "loss": 0.2622,
353
+ "step": 440
354
+ },
355
+ {
356
+ "epoch": 3.658536585365854,
357
+ "grad_norm": 2.731576442718506,
358
+ "learning_rate": 0.00015934959349593497,
359
+ "loss": 0.3485,
360
+ "step": 450
361
+ },
362
+ {
363
+ "epoch": 3.7398373983739837,
364
+ "grad_norm": 2.58368182182312,
365
+ "learning_rate": 0.00015844625112917797,
366
+ "loss": 0.2973,
367
+ "step": 460
368
+ },
369
+ {
370
+ "epoch": 3.821138211382114,
371
+ "grad_norm": 3.8228554725646973,
372
+ "learning_rate": 0.00015754290876242096,
373
+ "loss": 0.4206,
374
+ "step": 470
375
+ },
376
+ {
377
+ "epoch": 3.902439024390244,
378
+ "grad_norm": 10.00378131866455,
379
+ "learning_rate": 0.00015663956639566396,
380
+ "loss": 0.368,
381
+ "step": 480
382
+ },
383
+ {
384
+ "epoch": 3.983739837398374,
385
+ "grad_norm": 5.630249977111816,
386
+ "learning_rate": 0.00015573622402890699,
387
+ "loss": 0.3679,
388
+ "step": 490
389
+ },
390
+ {
391
+ "epoch": 4.065040650406504,
392
+ "grad_norm": 2.958164930343628,
393
+ "learning_rate": 0.00015483288166214996,
394
+ "loss": 0.237,
395
+ "step": 500
396
+ },
397
+ {
398
+ "epoch": 4.065040650406504,
399
+ "eval_accuracy": 0.6986301369863014,
400
+ "eval_loss": 0.8074837327003479,
401
+ "eval_runtime": 2.4126,
402
+ "eval_samples_per_second": 90.775,
403
+ "eval_steps_per_second": 11.606,
404
+ "step": 500
405
+ },
406
+ {
407
+ "epoch": 4.146341463414634,
408
+ "grad_norm": 1.0017706155776978,
409
+ "learning_rate": 0.00015392953929539295,
410
+ "loss": 0.2833,
411
+ "step": 510
412
+ },
413
+ {
414
+ "epoch": 4.227642276422764,
415
+ "grad_norm": 0.5290637612342834,
416
+ "learning_rate": 0.00015302619692863595,
417
+ "loss": 0.1401,
418
+ "step": 520
419
+ },
420
+ {
421
+ "epoch": 4.308943089430894,
422
+ "grad_norm": 9.38740348815918,
423
+ "learning_rate": 0.00015212285456187895,
424
+ "loss": 0.196,
425
+ "step": 530
426
+ },
427
+ {
428
+ "epoch": 4.390243902439025,
429
+ "grad_norm": 4.181729793548584,
430
+ "learning_rate": 0.00015121951219512197,
431
+ "loss": 0.1724,
432
+ "step": 540
433
+ },
434
+ {
435
+ "epoch": 4.471544715447155,
436
+ "grad_norm": 3.6319236755371094,
437
+ "learning_rate": 0.00015031616982836497,
438
+ "loss": 0.1901,
439
+ "step": 550
440
+ },
441
+ {
442
+ "epoch": 4.5528455284552845,
443
+ "grad_norm": 0.17511798441410065,
444
+ "learning_rate": 0.00014941282746160797,
445
+ "loss": 0.1958,
446
+ "step": 560
447
+ },
448
+ {
449
+ "epoch": 4.634146341463414,
450
+ "grad_norm": 3.161402702331543,
451
+ "learning_rate": 0.00014850948509485096,
452
+ "loss": 0.231,
453
+ "step": 570
454
+ },
455
+ {
456
+ "epoch": 4.715447154471545,
457
+ "grad_norm": 5.447460174560547,
458
+ "learning_rate": 0.00014760614272809396,
459
+ "loss": 0.25,
460
+ "step": 580
461
+ },
462
+ {
463
+ "epoch": 4.796747967479675,
464
+ "grad_norm": 5.651242256164551,
465
+ "learning_rate": 0.00014670280036133696,
466
+ "loss": 0.2985,
467
+ "step": 590
468
+ },
469
+ {
470
+ "epoch": 4.878048780487805,
471
+ "grad_norm": 1.8422802686691284,
472
+ "learning_rate": 0.00014579945799457996,
473
+ "loss": 0.2658,
474
+ "step": 600
475
+ },
476
+ {
477
+ "epoch": 4.878048780487805,
478
+ "eval_accuracy": 0.6210045662100456,
479
+ "eval_loss": 1.0878099203109741,
480
+ "eval_runtime": 1.8933,
481
+ "eval_samples_per_second": 115.67,
482
+ "eval_steps_per_second": 14.789,
483
+ "step": 600
484
+ },
485
+ {
486
+ "epoch": 4.959349593495935,
487
+ "grad_norm": 2.05578875541687,
488
+ "learning_rate": 0.00014489611562782295,
489
+ "loss": 0.2486,
490
+ "step": 610
491
+ },
492
+ {
493
+ "epoch": 5.040650406504065,
494
+ "grad_norm": 3.98470139503479,
495
+ "learning_rate": 0.00014399277326106595,
496
+ "loss": 0.179,
497
+ "step": 620
498
+ },
499
+ {
500
+ "epoch": 5.121951219512195,
501
+ "grad_norm": 1.0857895612716675,
502
+ "learning_rate": 0.00014308943089430895,
503
+ "loss": 0.1288,
504
+ "step": 630
505
+ },
506
+ {
507
+ "epoch": 5.203252032520325,
508
+ "grad_norm": 4.330402851104736,
509
+ "learning_rate": 0.00014218608852755194,
510
+ "loss": 0.3725,
511
+ "step": 640
512
+ },
513
+ {
514
+ "epoch": 5.284552845528455,
515
+ "grad_norm": 0.4425082206726074,
516
+ "learning_rate": 0.00014128274616079494,
517
+ "loss": 0.1386,
518
+ "step": 650
519
+ },
520
+ {
521
+ "epoch": 5.365853658536586,
522
+ "grad_norm": 5.232783794403076,
523
+ "learning_rate": 0.00014037940379403797,
524
+ "loss": 0.1728,
525
+ "step": 660
526
+ },
527
+ {
528
+ "epoch": 5.4471544715447155,
529
+ "grad_norm": 0.9079999923706055,
530
+ "learning_rate": 0.00013947606142728094,
531
+ "loss": 0.0875,
532
+ "step": 670
533
+ },
534
+ {
535
+ "epoch": 5.528455284552845,
536
+ "grad_norm": 0.10030949860811234,
537
+ "learning_rate": 0.00013857271906052393,
538
+ "loss": 0.0395,
539
+ "step": 680
540
+ },
541
+ {
542
+ "epoch": 5.609756097560975,
543
+ "grad_norm": 10.699735641479492,
544
+ "learning_rate": 0.00013766937669376693,
545
+ "loss": 0.2565,
546
+ "step": 690
547
+ },
548
+ {
549
+ "epoch": 5.691056910569106,
550
+ "grad_norm": 6.3885626792907715,
551
+ "learning_rate": 0.00013676603432700993,
552
+ "loss": 0.1904,
553
+ "step": 700
554
+ },
555
+ {
556
+ "epoch": 5.691056910569106,
557
+ "eval_accuracy": 0.7168949771689498,
558
+ "eval_loss": 1.104848027229309,
559
+ "eval_runtime": 1.9992,
560
+ "eval_samples_per_second": 109.543,
561
+ "eval_steps_per_second": 14.005,
562
+ "step": 700
563
+ },
564
+ {
565
+ "epoch": 5.772357723577236,
566
+ "grad_norm": 6.820797920227051,
567
+ "learning_rate": 0.00013586269196025295,
568
+ "loss": 0.0646,
569
+ "step": 710
570
+ },
571
+ {
572
+ "epoch": 5.853658536585366,
573
+ "grad_norm": 13.246321678161621,
574
+ "learning_rate": 0.00013495934959349595,
575
+ "loss": 0.0444,
576
+ "step": 720
577
+ },
578
+ {
579
+ "epoch": 5.934959349593496,
580
+ "grad_norm": 5.0205278396606445,
581
+ "learning_rate": 0.00013405600722673895,
582
+ "loss": 0.0963,
583
+ "step": 730
584
+ },
585
+ {
586
+ "epoch": 6.016260162601626,
587
+ "grad_norm": 0.06230660900473595,
588
+ "learning_rate": 0.00013315266485998194,
589
+ "loss": 0.0949,
590
+ "step": 740
591
+ },
592
+ {
593
+ "epoch": 6.097560975609756,
594
+ "grad_norm": 7.405625343322754,
595
+ "learning_rate": 0.00013224932249322494,
596
+ "loss": 0.0483,
597
+ "step": 750
598
+ },
599
+ {
600
+ "epoch": 6.178861788617886,
601
+ "grad_norm": 1.0345286130905151,
602
+ "learning_rate": 0.00013134598012646794,
603
+ "loss": 0.1235,
604
+ "step": 760
605
+ },
606
+ {
607
+ "epoch": 6.260162601626016,
608
+ "grad_norm": 0.08151334524154663,
609
+ "learning_rate": 0.00013044263775971094,
610
+ "loss": 0.0776,
611
+ "step": 770
612
+ },
613
+ {
614
+ "epoch": 6.341463414634147,
615
+ "grad_norm": 9.996505737304688,
616
+ "learning_rate": 0.00012953929539295393,
617
+ "loss": 0.0529,
618
+ "step": 780
619
+ },
620
+ {
621
+ "epoch": 6.4227642276422765,
622
+ "grad_norm": 0.07011505216360092,
623
+ "learning_rate": 0.00012863595302619693,
624
+ "loss": 0.0679,
625
+ "step": 790
626
+ },
627
+ {
628
+ "epoch": 6.504065040650406,
629
+ "grad_norm": 17.958656311035156,
630
+ "learning_rate": 0.00012773261065943993,
631
+ "loss": 0.0964,
632
+ "step": 800
633
+ },
634
+ {
635
+ "epoch": 6.504065040650406,
636
+ "eval_accuracy": 0.684931506849315,
637
+ "eval_loss": 1.3601832389831543,
638
+ "eval_runtime": 2.573,
639
+ "eval_samples_per_second": 85.116,
640
+ "eval_steps_per_second": 10.882,
641
+ "step": 800
642
+ },
643
+ {
644
+ "epoch": 6.585365853658536,
645
+ "grad_norm": 0.9644515514373779,
646
+ "learning_rate": 0.00012682926829268293,
647
+ "loss": 0.1316,
648
+ "step": 810
649
+ },
650
+ {
651
+ "epoch": 6.666666666666667,
652
+ "grad_norm": 1.0540770292282104,
653
+ "learning_rate": 0.00012592592592592592,
654
+ "loss": 0.0534,
655
+ "step": 820
656
+ },
657
+ {
658
+ "epoch": 6.747967479674797,
659
+ "grad_norm": 0.054866183549165726,
660
+ "learning_rate": 0.00012502258355916895,
661
+ "loss": 0.0815,
662
+ "step": 830
663
+ },
664
+ {
665
+ "epoch": 6.829268292682927,
666
+ "grad_norm": 0.04586861655116081,
667
+ "learning_rate": 0.00012411924119241194,
668
+ "loss": 0.0431,
669
+ "step": 840
670
+ },
671
+ {
672
+ "epoch": 6.9105691056910565,
673
+ "grad_norm": 0.1353844702243805,
674
+ "learning_rate": 0.00012321589882565491,
675
+ "loss": 0.0707,
676
+ "step": 850
677
+ },
678
+ {
679
+ "epoch": 6.991869918699187,
680
+ "grad_norm": 0.05645023286342621,
681
+ "learning_rate": 0.0001223125564588979,
682
+ "loss": 0.0362,
683
+ "step": 860
684
+ },
685
+ {
686
+ "epoch": 7.073170731707317,
687
+ "grad_norm": 14.361383438110352,
688
+ "learning_rate": 0.00012140921409214092,
689
+ "loss": 0.0788,
690
+ "step": 870
691
+ },
692
+ {
693
+ "epoch": 7.154471544715447,
694
+ "grad_norm": 0.6656379103660583,
695
+ "learning_rate": 0.00012050587172538392,
696
+ "loss": 0.0509,
697
+ "step": 880
698
+ },
699
+ {
700
+ "epoch": 7.235772357723577,
701
+ "grad_norm": 0.04472186788916588,
702
+ "learning_rate": 0.00011960252935862693,
703
+ "loss": 0.1443,
704
+ "step": 890
705
+ },
706
+ {
707
+ "epoch": 7.317073170731708,
708
+ "grad_norm": 1.4073199033737183,
709
+ "learning_rate": 0.00011869918699186993,
710
+ "loss": 0.0474,
711
+ "step": 900
712
+ },
713
+ {
714
+ "epoch": 7.317073170731708,
715
+ "eval_accuracy": 0.7671232876712328,
716
+ "eval_loss": 1.1331158876419067,
717
+ "eval_runtime": 1.8805,
718
+ "eval_samples_per_second": 116.46,
719
+ "eval_steps_per_second": 14.89,
720
+ "step": 900
721
+ },
722
+ {
723
+ "epoch": 7.3983739837398375,
724
+ "grad_norm": 7.994218826293945,
725
+ "learning_rate": 0.00011779584462511293,
726
+ "loss": 0.0686,
727
+ "step": 910
728
+ },
729
+ {
730
+ "epoch": 7.479674796747967,
731
+ "grad_norm": 3.781395673751831,
732
+ "learning_rate": 0.00011689250225835592,
733
+ "loss": 0.1022,
734
+ "step": 920
735
+ },
736
+ {
737
+ "epoch": 7.560975609756097,
738
+ "grad_norm": 0.045862827450037,
739
+ "learning_rate": 0.00011598915989159893,
740
+ "loss": 0.0991,
741
+ "step": 930
742
+ },
743
+ {
744
+ "epoch": 7.642276422764228,
745
+ "grad_norm": 0.25066855549812317,
746
+ "learning_rate": 0.0001150858175248419,
747
+ "loss": 0.0142,
748
+ "step": 940
749
+ },
750
+ {
751
+ "epoch": 7.723577235772358,
752
+ "grad_norm": 4.746194839477539,
753
+ "learning_rate": 0.00011418247515808491,
754
+ "loss": 0.0561,
755
+ "step": 950
756
+ },
757
+ {
758
+ "epoch": 7.804878048780488,
759
+ "grad_norm": 0.8477320075035095,
760
+ "learning_rate": 0.00011327913279132791,
761
+ "loss": 0.0645,
762
+ "step": 960
763
+ },
764
+ {
765
+ "epoch": 7.886178861788618,
766
+ "grad_norm": 0.5714166760444641,
767
+ "learning_rate": 0.00011237579042457091,
768
+ "loss": 0.0319,
769
+ "step": 970
770
+ },
771
+ {
772
+ "epoch": 7.967479674796748,
773
+ "grad_norm": 20.18045997619629,
774
+ "learning_rate": 0.00011147244805781392,
775
+ "loss": 0.0662,
776
+ "step": 980
777
+ },
778
+ {
779
+ "epoch": 8.048780487804878,
780
+ "grad_norm": 0.05691004544496536,
781
+ "learning_rate": 0.00011056910569105692,
782
+ "loss": 0.0668,
783
+ "step": 990
784
+ },
785
+ {
786
+ "epoch": 8.130081300813009,
787
+ "grad_norm": 0.12059393525123596,
788
+ "learning_rate": 0.00010966576332429991,
789
+ "loss": 0.1179,
790
+ "step": 1000
791
+ },
792
+ {
793
+ "epoch": 8.130081300813009,
794
+ "eval_accuracy": 0.730593607305936,
795
+ "eval_loss": 1.122756838798523,
796
+ "eval_runtime": 1.8559,
797
+ "eval_samples_per_second": 118.001,
798
+ "eval_steps_per_second": 15.087,
799
+ "step": 1000
800
+ },
801
+ {
802
+ "epoch": 8.211382113821138,
803
+ "grad_norm": 0.06441790610551834,
804
+ "learning_rate": 0.00010876242095754293,
805
+ "loss": 0.0362,
806
+ "step": 1010
807
+ },
808
+ {
809
+ "epoch": 8.292682926829269,
810
+ "grad_norm": 0.03285042569041252,
811
+ "learning_rate": 0.00010785907859078592,
812
+ "loss": 0.0277,
813
+ "step": 1020
814
+ },
815
+ {
816
+ "epoch": 8.373983739837398,
817
+ "grad_norm": 0.03812731057405472,
818
+ "learning_rate": 0.0001069557362240289,
819
+ "loss": 0.0094,
820
+ "step": 1030
821
+ },
822
+ {
823
+ "epoch": 8.455284552845528,
824
+ "grad_norm": 0.0354076623916626,
825
+ "learning_rate": 0.0001060523938572719,
826
+ "loss": 0.011,
827
+ "step": 1040
828
+ },
829
+ {
830
+ "epoch": 8.536585365853659,
831
+ "grad_norm": 0.05294572561979294,
832
+ "learning_rate": 0.0001051490514905149,
833
+ "loss": 0.0397,
834
+ "step": 1050
835
+ },
836
+ {
837
+ "epoch": 8.617886178861788,
838
+ "grad_norm": 0.029015598818659782,
839
+ "learning_rate": 0.00010424570912375791,
840
+ "loss": 0.0623,
841
+ "step": 1060
842
+ },
843
+ {
844
+ "epoch": 8.699186991869919,
845
+ "grad_norm": 0.02916688844561577,
846
+ "learning_rate": 0.00010334236675700091,
847
+ "loss": 0.0208,
848
+ "step": 1070
849
+ },
850
+ {
851
+ "epoch": 8.78048780487805,
852
+ "grad_norm": 2.3230416774749756,
853
+ "learning_rate": 0.0001024390243902439,
854
+ "loss": 0.0585,
855
+ "step": 1080
856
+ },
857
+ {
858
+ "epoch": 8.861788617886178,
859
+ "grad_norm": 0.024655325338244438,
860
+ "learning_rate": 0.00010153568202348692,
861
+ "loss": 0.0474,
862
+ "step": 1090
863
+ },
864
+ {
865
+ "epoch": 8.94308943089431,
866
+ "grad_norm": 0.09337528795003891,
867
+ "learning_rate": 0.00010063233965672991,
868
+ "loss": 0.0447,
869
+ "step": 1100
870
+ },
871
+ {
872
+ "epoch": 8.94308943089431,
873
+ "eval_accuracy": 0.7397260273972602,
874
+ "eval_loss": 1.260903239250183,
875
+ "eval_runtime": 2.0272,
876
+ "eval_samples_per_second": 108.029,
877
+ "eval_steps_per_second": 13.812,
878
+ "step": 1100
879
+ },
880
+ {
881
+ "epoch": 9.024390243902438,
882
+ "grad_norm": 0.06281863152980804,
883
+ "learning_rate": 9.97289972899729e-05,
884
+ "loss": 0.0153,
885
+ "step": 1110
886
+ },
887
+ {
888
+ "epoch": 9.105691056910569,
889
+ "grad_norm": 0.025806061923503876,
890
+ "learning_rate": 9.882565492321591e-05,
891
+ "loss": 0.0046,
892
+ "step": 1120
893
+ },
894
+ {
895
+ "epoch": 9.1869918699187,
896
+ "grad_norm": 0.024874288588762283,
897
+ "learning_rate": 9.79223125564589e-05,
898
+ "loss": 0.0055,
899
+ "step": 1130
900
+ },
901
+ {
902
+ "epoch": 9.268292682926829,
903
+ "grad_norm": 0.03554617613554001,
904
+ "learning_rate": 9.701897018970189e-05,
905
+ "loss": 0.006,
906
+ "step": 1140
907
+ },
908
+ {
909
+ "epoch": 9.34959349593496,
910
+ "grad_norm": 0.020841993391513824,
911
+ "learning_rate": 9.61156278229449e-05,
912
+ "loss": 0.004,
913
+ "step": 1150
914
+ },
915
+ {
916
+ "epoch": 9.43089430894309,
917
+ "grad_norm": 0.02096891961991787,
918
+ "learning_rate": 9.52122854561879e-05,
919
+ "loss": 0.0318,
920
+ "step": 1160
921
+ },
922
+ {
923
+ "epoch": 9.512195121951219,
924
+ "grad_norm": 0.024508224800229073,
925
+ "learning_rate": 9.43089430894309e-05,
926
+ "loss": 0.0423,
927
+ "step": 1170
928
+ },
929
+ {
930
+ "epoch": 9.59349593495935,
931
+ "grad_norm": 0.024349646642804146,
932
+ "learning_rate": 9.34056007226739e-05,
933
+ "loss": 0.0432,
934
+ "step": 1180
935
+ },
936
+ {
937
+ "epoch": 9.67479674796748,
938
+ "grad_norm": 0.024144969880580902,
939
+ "learning_rate": 9.250225835591689e-05,
940
+ "loss": 0.036,
941
+ "step": 1190
942
+ },
943
+ {
944
+ "epoch": 9.75609756097561,
945
+ "grad_norm": 0.025365758687257767,
946
+ "learning_rate": 9.15989159891599e-05,
947
+ "loss": 0.0043,
948
+ "step": 1200
949
+ },
950
+ {
951
+ "epoch": 9.75609756097561,
952
+ "eval_accuracy": 0.776255707762557,
953
+ "eval_loss": 1.1745853424072266,
954
+ "eval_runtime": 1.8633,
955
+ "eval_samples_per_second": 117.532,
956
+ "eval_steps_per_second": 15.027,
957
+ "step": 1200
958
+ },
959
+ {
960
+ "epoch": 9.83739837398374,
961
+ "grad_norm": 0.020441517233848572,
962
+ "learning_rate": 9.06955736224029e-05,
963
+ "loss": 0.0048,
964
+ "step": 1210
965
+ },
966
+ {
967
+ "epoch": 9.91869918699187,
968
+ "grad_norm": 7.5243239402771,
969
+ "learning_rate": 8.97922312556459e-05,
970
+ "loss": 0.0566,
971
+ "step": 1220
972
+ },
973
+ {
974
+ "epoch": 10.0,
975
+ "grad_norm": 0.02440631203353405,
976
+ "learning_rate": 8.888888888888889e-05,
977
+ "loss": 0.0663,
978
+ "step": 1230
979
+ },
980
+ {
981
+ "epoch": 10.08130081300813,
982
+ "grad_norm": 0.020575933158397675,
983
+ "learning_rate": 8.798554652213189e-05,
984
+ "loss": 0.0316,
985
+ "step": 1240
986
+ },
987
+ {
988
+ "epoch": 10.16260162601626,
989
+ "grad_norm": 0.0206364244222641,
990
+ "learning_rate": 8.708220415537489e-05,
991
+ "loss": 0.021,
992
+ "step": 1250
993
+ },
994
+ {
995
+ "epoch": 10.24390243902439,
996
+ "grad_norm": 0.025878455489873886,
997
+ "learning_rate": 8.61788617886179e-05,
998
+ "loss": 0.0159,
999
+ "step": 1260
1000
+ },
1001
+ {
1002
+ "epoch": 10.32520325203252,
1003
+ "grad_norm": 0.03842271491885185,
1004
+ "learning_rate": 8.52755194218609e-05,
1005
+ "loss": 0.0035,
1006
+ "step": 1270
1007
+ },
1008
+ {
1009
+ "epoch": 10.40650406504065,
1010
+ "grad_norm": 0.017291786149144173,
1011
+ "learning_rate": 8.437217705510388e-05,
1012
+ "loss": 0.004,
1013
+ "step": 1280
1014
+ },
1015
+ {
1016
+ "epoch": 10.487804878048781,
1017
+ "grad_norm": 0.01711260713636875,
1018
+ "learning_rate": 8.346883468834689e-05,
1019
+ "loss": 0.0034,
1020
+ "step": 1290
1021
+ },
1022
+ {
1023
+ "epoch": 10.56910569105691,
1024
+ "grad_norm": 0.5960690975189209,
1025
+ "learning_rate": 8.256549232158989e-05,
1026
+ "loss": 0.1059,
1027
+ "step": 1300
1028
+ },
1029
+ {
1030
+ "epoch": 10.56910569105691,
1031
+ "eval_accuracy": 0.776255707762557,
1032
+ "eval_loss": 1.186672568321228,
1033
+ "eval_runtime": 2.3237,
1034
+ "eval_samples_per_second": 94.245,
1035
+ "eval_steps_per_second": 12.05,
1036
+ "step": 1300
1037
+ },
1038
+ {
1039
+ "epoch": 10.65040650406504,
1040
+ "grad_norm": 0.024842064827680588,
1041
+ "learning_rate": 8.166214995483289e-05,
1042
+ "loss": 0.027,
1043
+ "step": 1310
1044
+ },
1045
+ {
1046
+ "epoch": 10.731707317073171,
1047
+ "grad_norm": 0.022030914202332497,
1048
+ "learning_rate": 8.075880758807588e-05,
1049
+ "loss": 0.0275,
1050
+ "step": 1320
1051
+ },
1052
+ {
1053
+ "epoch": 10.8130081300813,
1054
+ "grad_norm": 0.024346347898244858,
1055
+ "learning_rate": 7.985546522131888e-05,
1056
+ "loss": 0.0037,
1057
+ "step": 1330
1058
+ },
1059
+ {
1060
+ "epoch": 10.894308943089431,
1061
+ "grad_norm": 0.019560035318136215,
1062
+ "learning_rate": 7.895212285456188e-05,
1063
+ "loss": 0.0037,
1064
+ "step": 1340
1065
+ },
1066
+ {
1067
+ "epoch": 10.975609756097562,
1068
+ "grad_norm": 0.016407785937190056,
1069
+ "learning_rate": 7.804878048780489e-05,
1070
+ "loss": 0.0132,
1071
+ "step": 1350
1072
+ },
1073
+ {
1074
+ "epoch": 11.05691056910569,
1075
+ "grad_norm": 0.015580049715936184,
1076
+ "learning_rate": 7.714543812104789e-05,
1077
+ "loss": 0.0028,
1078
+ "step": 1360
1079
+ },
1080
+ {
1081
+ "epoch": 11.138211382113822,
1082
+ "grad_norm": 0.01547026913613081,
1083
+ "learning_rate": 7.624209575429088e-05,
1084
+ "loss": 0.019,
1085
+ "step": 1370
1086
+ },
1087
+ {
1088
+ "epoch": 11.21951219512195,
1089
+ "grad_norm": 0.01568036712706089,
1090
+ "learning_rate": 7.533875338753388e-05,
1091
+ "loss": 0.0026,
1092
+ "step": 1380
1093
+ },
1094
+ {
1095
+ "epoch": 11.300813008130081,
1096
+ "grad_norm": 0.016332248225808144,
1097
+ "learning_rate": 7.443541102077688e-05,
1098
+ "loss": 0.003,
1099
+ "step": 1390
1100
+ },
1101
+ {
1102
+ "epoch": 11.382113821138212,
1103
+ "grad_norm": 0.015115007758140564,
1104
+ "learning_rate": 7.353206865401989e-05,
1105
+ "loss": 0.0026,
1106
+ "step": 1400
1107
+ },
1108
+ {
1109
+ "epoch": 11.382113821138212,
1110
+ "eval_accuracy": 0.7534246575342466,
1111
+ "eval_loss": 1.2890268564224243,
1112
+ "eval_runtime": 1.8821,
1113
+ "eval_samples_per_second": 116.362,
1114
+ "eval_steps_per_second": 14.877,
1115
+ "step": 1400
1116
+ },
1117
+ {
1118
+ "epoch": 11.463414634146341,
1119
+ "grad_norm": 0.019173264503479004,
1120
+ "learning_rate": 7.262872628726287e-05,
1121
+ "loss": 0.0025,
1122
+ "step": 1410
1123
+ },
1124
+ {
1125
+ "epoch": 11.544715447154472,
1126
+ "grad_norm": 0.01367194764316082,
1127
+ "learning_rate": 7.172538392050587e-05,
1128
+ "loss": 0.0025,
1129
+ "step": 1420
1130
+ },
1131
+ {
1132
+ "epoch": 11.6260162601626,
1133
+ "grad_norm": 0.012898310087621212,
1134
+ "learning_rate": 7.082204155374888e-05,
1135
+ "loss": 0.0024,
1136
+ "step": 1430
1137
+ },
1138
+ {
1139
+ "epoch": 11.707317073170731,
1140
+ "grad_norm": 0.014037015847861767,
1141
+ "learning_rate": 6.991869918699188e-05,
1142
+ "loss": 0.0023,
1143
+ "step": 1440
1144
+ },
1145
+ {
1146
+ "epoch": 11.788617886178862,
1147
+ "grad_norm": 0.012620938010513783,
1148
+ "learning_rate": 6.901535682023487e-05,
1149
+ "loss": 0.0023,
1150
+ "step": 1450
1151
+ },
1152
+ {
1153
+ "epoch": 11.869918699186991,
1154
+ "grad_norm": 0.0131882568821311,
1155
+ "learning_rate": 6.811201445347787e-05,
1156
+ "loss": 0.0023,
1157
+ "step": 1460
1158
+ },
1159
+ {
1160
+ "epoch": 11.951219512195122,
1161
+ "grad_norm": 0.01338967401534319,
1162
+ "learning_rate": 6.720867208672087e-05,
1163
+ "loss": 0.0022,
1164
+ "step": 1470
1165
+ },
1166
+ {
1167
+ "epoch": 12.032520325203253,
1168
+ "grad_norm": 0.01590082049369812,
1169
+ "learning_rate": 6.630532971996387e-05,
1170
+ "loss": 0.0342,
1171
+ "step": 1480
1172
+ },
1173
+ {
1174
+ "epoch": 12.113821138211382,
1175
+ "grad_norm": 0.027159936726093292,
1176
+ "learning_rate": 6.540198735320688e-05,
1177
+ "loss": 0.0033,
1178
+ "step": 1490
1179
+ },
1180
+ {
1181
+ "epoch": 12.195121951219512,
1182
+ "grad_norm": 0.028470635414123535,
1183
+ "learning_rate": 6.449864498644986e-05,
1184
+ "loss": 0.0039,
1185
+ "step": 1500
1186
+ },
1187
+ {
1188
+ "epoch": 12.195121951219512,
1189
+ "eval_accuracy": 0.7579908675799086,
1190
+ "eval_loss": 1.328336238861084,
1191
+ "eval_runtime": 1.8861,
1192
+ "eval_samples_per_second": 116.112,
1193
+ "eval_steps_per_second": 14.845,
1194
+ "step": 1500
1195
+ },
1196
+ {
1197
+ "epoch": 12.276422764227643,
1198
+ "grad_norm": 0.017821423709392548,
1199
+ "learning_rate": 6.359530261969287e-05,
1200
+ "loss": 0.0024,
1201
+ "step": 1510
1202
+ },
1203
+ {
1204
+ "epoch": 12.357723577235772,
1205
+ "grad_norm": 0.0137524688616395,
1206
+ "learning_rate": 6.269196025293587e-05,
1207
+ "loss": 0.0021,
1208
+ "step": 1520
1209
+ },
1210
+ {
1211
+ "epoch": 12.439024390243903,
1212
+ "grad_norm": 0.012154987081885338,
1213
+ "learning_rate": 6.178861788617887e-05,
1214
+ "loss": 0.0022,
1215
+ "step": 1530
1216
+ },
1217
+ {
1218
+ "epoch": 12.520325203252032,
1219
+ "grad_norm": 0.012079431675374508,
1220
+ "learning_rate": 6.0885275519421857e-05,
1221
+ "loss": 0.0021,
1222
+ "step": 1540
1223
+ },
1224
+ {
1225
+ "epoch": 12.601626016260163,
1226
+ "grad_norm": 0.013092798180878162,
1227
+ "learning_rate": 5.998193315266486e-05,
1228
+ "loss": 0.0021,
1229
+ "step": 1550
1230
+ },
1231
+ {
1232
+ "epoch": 12.682926829268293,
1233
+ "grad_norm": 0.036075592041015625,
1234
+ "learning_rate": 5.9078590785907865e-05,
1235
+ "loss": 0.0108,
1236
+ "step": 1560
1237
+ },
1238
+ {
1239
+ "epoch": 12.764227642276422,
1240
+ "grad_norm": 0.0120172630995512,
1241
+ "learning_rate": 5.817524841915086e-05,
1242
+ "loss": 0.0029,
1243
+ "step": 1570
1244
+ },
1245
+ {
1246
+ "epoch": 12.845528455284553,
1247
+ "grad_norm": 0.012298657558858395,
1248
+ "learning_rate": 5.7271906052393866e-05,
1249
+ "loss": 0.0099,
1250
+ "step": 1580
1251
+ },
1252
+ {
1253
+ "epoch": 12.926829268292684,
1254
+ "grad_norm": 0.016028335317969322,
1255
+ "learning_rate": 5.6368563685636857e-05,
1256
+ "loss": 0.0102,
1257
+ "step": 1590
1258
+ },
1259
+ {
1260
+ "epoch": 13.008130081300813,
1261
+ "grad_norm": 0.011840942315757275,
1262
+ "learning_rate": 5.5465221318879854e-05,
1263
+ "loss": 0.002,
1264
+ "step": 1600
1265
+ },
1266
+ {
1267
+ "epoch": 13.008130081300813,
1268
+ "eval_accuracy": 0.7671232876712328,
1269
+ "eval_loss": 1.1871178150177002,
1270
+ "eval_runtime": 1.8919,
1271
+ "eval_samples_per_second": 115.755,
1272
+ "eval_steps_per_second": 14.8,
1273
+ "step": 1600
1274
+ },
1275
+ {
1276
+ "epoch": 13.089430894308943,
1277
+ "grad_norm": 0.011245610192418098,
1278
+ "learning_rate": 5.456187895212286e-05,
1279
+ "loss": 0.0019,
1280
+ "step": 1610
1281
+ },
1282
+ {
1283
+ "epoch": 13.170731707317072,
1284
+ "grad_norm": 0.010579339228570461,
1285
+ "learning_rate": 5.365853658536586e-05,
1286
+ "loss": 0.0019,
1287
+ "step": 1620
1288
+ },
1289
+ {
1290
+ "epoch": 13.252032520325203,
1291
+ "grad_norm": 0.5460268259048462,
1292
+ "learning_rate": 5.275519421860885e-05,
1293
+ "loss": 0.0021,
1294
+ "step": 1630
1295
+ },
1296
+ {
1297
+ "epoch": 13.333333333333334,
1298
+ "grad_norm": 0.011547455564141273,
1299
+ "learning_rate": 5.185185185185185e-05,
1300
+ "loss": 0.0019,
1301
+ "step": 1640
1302
+ },
1303
+ {
1304
+ "epoch": 13.414634146341463,
1305
+ "grad_norm": 0.010749176144599915,
1306
+ "learning_rate": 5.0948509485094854e-05,
1307
+ "loss": 0.0018,
1308
+ "step": 1650
1309
+ },
1310
+ {
1311
+ "epoch": 13.495934959349594,
1312
+ "grad_norm": 0.020905395969748497,
1313
+ "learning_rate": 5.004516711833785e-05,
1314
+ "loss": 0.0019,
1315
+ "step": 1660
1316
+ },
1317
+ {
1318
+ "epoch": 13.577235772357724,
1319
+ "grad_norm": 0.010568364523351192,
1320
+ "learning_rate": 4.914182475158085e-05,
1321
+ "loss": 0.0018,
1322
+ "step": 1670
1323
+ },
1324
+ {
1325
+ "epoch": 13.658536585365853,
1326
+ "grad_norm": 0.010385628789663315,
1327
+ "learning_rate": 4.823848238482385e-05,
1328
+ "loss": 0.0457,
1329
+ "step": 1680
1330
+ },
1331
+ {
1332
+ "epoch": 13.739837398373984,
1333
+ "grad_norm": 0.011808566749095917,
1334
+ "learning_rate": 4.733514001806685e-05,
1335
+ "loss": 0.0019,
1336
+ "step": 1690
1337
+ },
1338
+ {
1339
+ "epoch": 13.821138211382113,
1340
+ "grad_norm": 0.01071181334555149,
1341
+ "learning_rate": 4.643179765130985e-05,
1342
+ "loss": 0.0019,
1343
+ "step": 1700
1344
+ },
1345
+ {
1346
+ "epoch": 13.821138211382113,
1347
+ "eval_accuracy": 0.7899543378995434,
1348
+ "eval_loss": 1.1642991304397583,
1349
+ "eval_runtime": 2.6737,
1350
+ "eval_samples_per_second": 81.908,
1351
+ "eval_steps_per_second": 10.472,
1352
+ "step": 1700
1353
+ },
1354
+ {
1355
+ "epoch": 13.902439024390244,
1356
+ "grad_norm": 0.009974062442779541,
1357
+ "learning_rate": 4.5528455284552844e-05,
1358
+ "loss": 0.0019,
1359
+ "step": 1710
1360
+ },
1361
+ {
1362
+ "epoch": 13.983739837398375,
1363
+ "grad_norm": 0.010520576499402523,
1364
+ "learning_rate": 4.462511291779585e-05,
1365
+ "loss": 0.0019,
1366
+ "step": 1720
1367
+ },
1368
+ {
1369
+ "epoch": 14.065040650406504,
1370
+ "grad_norm": 0.010491529479622841,
1371
+ "learning_rate": 4.3721770551038846e-05,
1372
+ "loss": 0.0018,
1373
+ "step": 1730
1374
+ },
1375
+ {
1376
+ "epoch": 14.146341463414634,
1377
+ "grad_norm": 0.010812795720994473,
1378
+ "learning_rate": 4.281842818428184e-05,
1379
+ "loss": 0.0018,
1380
+ "step": 1740
1381
+ },
1382
+ {
1383
+ "epoch": 14.227642276422765,
1384
+ "grad_norm": 0.010520540177822113,
1385
+ "learning_rate": 4.191508581752485e-05,
1386
+ "loss": 0.0017,
1387
+ "step": 1750
1388
+ },
1389
+ {
1390
+ "epoch": 14.308943089430894,
1391
+ "grad_norm": 0.010174380615353584,
1392
+ "learning_rate": 4.1011743450767844e-05,
1393
+ "loss": 0.0017,
1394
+ "step": 1760
1395
+ },
1396
+ {
1397
+ "epoch": 14.390243902439025,
1398
+ "grad_norm": 0.014143481850624084,
1399
+ "learning_rate": 4.010840108401084e-05,
1400
+ "loss": 0.0018,
1401
+ "step": 1770
1402
+ },
1403
+ {
1404
+ "epoch": 14.471544715447154,
1405
+ "grad_norm": 0.010679790750145912,
1406
+ "learning_rate": 3.920505871725384e-05,
1407
+ "loss": 0.0017,
1408
+ "step": 1780
1409
+ },
1410
+ {
1411
+ "epoch": 14.552845528455284,
1412
+ "grad_norm": 0.009599764831364155,
1413
+ "learning_rate": 3.830171635049684e-05,
1414
+ "loss": 0.0017,
1415
+ "step": 1790
1416
+ },
1417
+ {
1418
+ "epoch": 14.634146341463415,
1419
+ "grad_norm": 0.011851229704916477,
1420
+ "learning_rate": 3.739837398373984e-05,
1421
+ "loss": 0.0264,
1422
+ "step": 1800
1423
+ },
1424
+ {
1425
+ "epoch": 14.634146341463415,
1426
+ "eval_accuracy": 0.7899543378995434,
1427
+ "eval_loss": 1.1537418365478516,
1428
+ "eval_runtime": 2.1352,
1429
+ "eval_samples_per_second": 102.565,
1430
+ "eval_steps_per_second": 13.113,
1431
+ "step": 1800
1432
+ },
1433
+ {
1434
+ "epoch": 14.715447154471544,
1435
+ "grad_norm": 0.010717163793742657,
1436
+ "learning_rate": 3.649503161698284e-05,
1437
+ "loss": 0.0017,
1438
+ "step": 1810
1439
+ },
1440
+ {
1441
+ "epoch": 14.796747967479675,
1442
+ "grad_norm": 0.00999755784869194,
1443
+ "learning_rate": 3.5591689250225835e-05,
1444
+ "loss": 0.0017,
1445
+ "step": 1820
1446
+ },
1447
+ {
1448
+ "epoch": 14.878048780487806,
1449
+ "grad_norm": 0.01001573447138071,
1450
+ "learning_rate": 3.468834688346884e-05,
1451
+ "loss": 0.0018,
1452
+ "step": 1830
1453
+ },
1454
+ {
1455
+ "epoch": 14.959349593495935,
1456
+ "grad_norm": 0.013388333842158318,
1457
+ "learning_rate": 3.3785004516711836e-05,
1458
+ "loss": 0.0017,
1459
+ "step": 1840
1460
+ },
1461
+ {
1462
+ "epoch": 15.040650406504065,
1463
+ "grad_norm": 0.0097127016633749,
1464
+ "learning_rate": 3.2881662149954834e-05,
1465
+ "loss": 0.0017,
1466
+ "step": 1850
1467
+ },
1468
+ {
1469
+ "epoch": 15.121951219512194,
1470
+ "grad_norm": 0.009287914261221886,
1471
+ "learning_rate": 3.197831978319784e-05,
1472
+ "loss": 0.0016,
1473
+ "step": 1860
1474
+ },
1475
+ {
1476
+ "epoch": 15.203252032520325,
1477
+ "grad_norm": 0.009506735019385815,
1478
+ "learning_rate": 3.107497741644083e-05,
1479
+ "loss": 0.0016,
1480
+ "step": 1870
1481
+ },
1482
+ {
1483
+ "epoch": 15.284552845528456,
1484
+ "grad_norm": 0.009844713844358921,
1485
+ "learning_rate": 3.0171635049683832e-05,
1486
+ "loss": 0.0016,
1487
+ "step": 1880
1488
+ },
1489
+ {
1490
+ "epoch": 15.365853658536585,
1491
+ "grad_norm": 0.00882001779973507,
1492
+ "learning_rate": 2.926829268292683e-05,
1493
+ "loss": 0.0016,
1494
+ "step": 1890
1495
+ },
1496
+ {
1497
+ "epoch": 15.447154471544716,
1498
+ "grad_norm": 0.008611707016825676,
1499
+ "learning_rate": 2.836495031616983e-05,
1500
+ "loss": 0.0015,
1501
+ "step": 1900
1502
+ },
1503
+ {
1504
+ "epoch": 15.447154471544716,
1505
+ "eval_accuracy": 0.7945205479452054,
1506
+ "eval_loss": 1.182112216949463,
1507
+ "eval_runtime": 2.5496,
1508
+ "eval_samples_per_second": 85.896,
1509
+ "eval_steps_per_second": 10.982,
1510
+ "step": 1900
1511
+ },
1512
+ {
1513
+ "epoch": 15.528455284552846,
1514
+ "grad_norm": 0.008707600645720959,
1515
+ "learning_rate": 2.7461607949412827e-05,
1516
+ "loss": 0.0015,
1517
+ "step": 1910
1518
+ },
1519
+ {
1520
+ "epoch": 15.609756097560975,
1521
+ "grad_norm": 0.008637920022010803,
1522
+ "learning_rate": 2.6558265582655828e-05,
1523
+ "loss": 0.0015,
1524
+ "step": 1920
1525
+ },
1526
+ {
1527
+ "epoch": 15.691056910569106,
1528
+ "grad_norm": 0.008965054526925087,
1529
+ "learning_rate": 2.565492321589883e-05,
1530
+ "loss": 0.0015,
1531
+ "step": 1930
1532
+ },
1533
+ {
1534
+ "epoch": 15.772357723577235,
1535
+ "grad_norm": 0.008836538530886173,
1536
+ "learning_rate": 2.4751580849141826e-05,
1537
+ "loss": 0.0015,
1538
+ "step": 1940
1539
+ },
1540
+ {
1541
+ "epoch": 15.853658536585366,
1542
+ "grad_norm": 0.008493401110172272,
1543
+ "learning_rate": 2.3848238482384823e-05,
1544
+ "loss": 0.0015,
1545
+ "step": 1950
1546
+ },
1547
+ {
1548
+ "epoch": 15.934959349593496,
1549
+ "grad_norm": 0.008783240802586079,
1550
+ "learning_rate": 2.2944896115627824e-05,
1551
+ "loss": 0.0015,
1552
+ "step": 1960
1553
+ },
1554
+ {
1555
+ "epoch": 16.016260162601625,
1556
+ "grad_norm": 0.008636926300823689,
1557
+ "learning_rate": 2.204155374887082e-05,
1558
+ "loss": 0.0015,
1559
+ "step": 1970
1560
+ },
1561
+ {
1562
+ "epoch": 16.097560975609756,
1563
+ "grad_norm": 0.00970914401113987,
1564
+ "learning_rate": 2.1138211382113822e-05,
1565
+ "loss": 0.0015,
1566
+ "step": 1980
1567
+ },
1568
+ {
1569
+ "epoch": 16.178861788617887,
1570
+ "grad_norm": 0.008323701098561287,
1571
+ "learning_rate": 2.0234869015356823e-05,
1572
+ "loss": 0.0015,
1573
+ "step": 1990
1574
+ },
1575
+ {
1576
+ "epoch": 16.260162601626018,
1577
+ "grad_norm": 0.008794574066996574,
1578
+ "learning_rate": 1.933152664859982e-05,
1579
+ "loss": 0.0015,
1580
+ "step": 2000
1581
+ },
1582
+ {
1583
+ "epoch": 16.260162601626018,
1584
+ "eval_accuracy": 0.7899543378995434,
1585
+ "eval_loss": 1.196179986000061,
1586
+ "eval_runtime": 2.2597,
1587
+ "eval_samples_per_second": 96.915,
1588
+ "eval_steps_per_second": 12.391,
1589
+ "step": 2000
1590
+ },
1591
+ {
1592
+ "epoch": 16.341463414634145,
1593
+ "grad_norm": 0.009114695712924004,
1594
+ "learning_rate": 1.842818428184282e-05,
1595
+ "loss": 0.0015,
1596
+ "step": 2010
1597
+ },
1598
+ {
1599
+ "epoch": 16.422764227642276,
1600
+ "grad_norm": 0.008416150696575642,
1601
+ "learning_rate": 1.7524841915085818e-05,
1602
+ "loss": 0.0014,
1603
+ "step": 2020
1604
+ },
1605
+ {
1606
+ "epoch": 16.504065040650406,
1607
+ "grad_norm": 0.008963138796389103,
1608
+ "learning_rate": 1.662149954832882e-05,
1609
+ "loss": 0.0014,
1610
+ "step": 2030
1611
+ },
1612
+ {
1613
+ "epoch": 16.585365853658537,
1614
+ "grad_norm": 0.008162124082446098,
1615
+ "learning_rate": 1.5718157181571816e-05,
1616
+ "loss": 0.0014,
1617
+ "step": 2040
1618
+ },
1619
+ {
1620
+ "epoch": 16.666666666666668,
1621
+ "grad_norm": 0.008323684334754944,
1622
+ "learning_rate": 1.4814814814814815e-05,
1623
+ "loss": 0.0014,
1624
+ "step": 2050
1625
+ },
1626
+ {
1627
+ "epoch": 16.747967479674795,
1628
+ "grad_norm": 0.00929880328476429,
1629
+ "learning_rate": 1.3911472448057814e-05,
1630
+ "loss": 0.0014,
1631
+ "step": 2060
1632
+ },
1633
+ {
1634
+ "epoch": 16.829268292682926,
1635
+ "grad_norm": 0.008141223341226578,
1636
+ "learning_rate": 1.3008130081300815e-05,
1637
+ "loss": 0.0014,
1638
+ "step": 2070
1639
+ },
1640
+ {
1641
+ "epoch": 16.910569105691057,
1642
+ "grad_norm": 0.00794750452041626,
1643
+ "learning_rate": 1.2104787714543812e-05,
1644
+ "loss": 0.0014,
1645
+ "step": 2080
1646
+ },
1647
+ {
1648
+ "epoch": 16.991869918699187,
1649
+ "grad_norm": 0.008513950742781162,
1650
+ "learning_rate": 1.1201445347786811e-05,
1651
+ "loss": 0.0014,
1652
+ "step": 2090
1653
+ },
1654
+ {
1655
+ "epoch": 17.073170731707318,
1656
+ "grad_norm": 0.008719543926417828,
1657
+ "learning_rate": 1.0298102981029812e-05,
1658
+ "loss": 0.0014,
1659
+ "step": 2100
1660
+ },
1661
+ {
1662
+ "epoch": 17.073170731707318,
1663
+ "eval_accuracy": 0.7899543378995434,
1664
+ "eval_loss": 1.2036298513412476,
1665
+ "eval_runtime": 1.882,
1666
+ "eval_samples_per_second": 116.365,
1667
+ "eval_steps_per_second": 14.878,
1668
+ "step": 2100
1669
+ },
1670
+ {
1671
+ "epoch": 17.15447154471545,
1672
+ "grad_norm": 0.007771148346364498,
1673
+ "learning_rate": 9.39476061427281e-06,
1674
+ "loss": 0.0014,
1675
+ "step": 2110
1676
+ },
1677
+ {
1678
+ "epoch": 17.235772357723576,
1679
+ "grad_norm": 0.008955016732215881,
1680
+ "learning_rate": 8.49141824751581e-06,
1681
+ "loss": 0.0014,
1682
+ "step": 2120
1683
+ },
1684
+ {
1685
+ "epoch": 17.317073170731707,
1686
+ "grad_norm": 0.008788557723164558,
1687
+ "learning_rate": 7.588075880758808e-06,
1688
+ "loss": 0.0014,
1689
+ "step": 2130
1690
+ },
1691
+ {
1692
+ "epoch": 17.398373983739837,
1693
+ "grad_norm": 0.007935418747365475,
1694
+ "learning_rate": 6.684733514001807e-06,
1695
+ "loss": 0.0014,
1696
+ "step": 2140
1697
+ },
1698
+ {
1699
+ "epoch": 17.479674796747968,
1700
+ "grad_norm": 0.007881587371230125,
1701
+ "learning_rate": 5.781391147244806e-06,
1702
+ "loss": 0.0014,
1703
+ "step": 2150
1704
+ },
1705
+ {
1706
+ "epoch": 17.5609756097561,
1707
+ "grad_norm": 0.0086191575974226,
1708
+ "learning_rate": 4.8780487804878055e-06,
1709
+ "loss": 0.0014,
1710
+ "step": 2160
1711
+ },
1712
+ {
1713
+ "epoch": 17.642276422764226,
1714
+ "grad_norm": 0.009581954218447208,
1715
+ "learning_rate": 3.9747064137308045e-06,
1716
+ "loss": 0.0014,
1717
+ "step": 2170
1718
+ },
1719
+ {
1720
+ "epoch": 17.723577235772357,
1721
+ "grad_norm": 0.009051047265529633,
1722
+ "learning_rate": 3.071364046973803e-06,
1723
+ "loss": 0.0014,
1724
+ "step": 2180
1725
+ },
1726
+ {
1727
+ "epoch": 17.804878048780488,
1728
+ "grad_norm": 0.008541719056665897,
1729
+ "learning_rate": 2.1680216802168024e-06,
1730
+ "loss": 0.0014,
1731
+ "step": 2190
1732
+ },
1733
+ {
1734
+ "epoch": 17.88617886178862,
1735
+ "grad_norm": 0.007614914793521166,
1736
+ "learning_rate": 1.2646793134598014e-06,
1737
+ "loss": 0.0014,
1738
+ "step": 2200
1739
+ },
1740
+ {
1741
+ "epoch": 17.88617886178862,
1742
+ "eval_accuracy": 0.7899543378995434,
1743
+ "eval_loss": 1.2066991329193115,
1744
+ "eval_runtime": 1.8765,
1745
+ "eval_samples_per_second": 116.706,
1746
+ "eval_steps_per_second": 14.921,
1747
+ "step": 2200
1748
+ },
1749
+ {
1750
+ "epoch": 17.96747967479675,
1751
+ "grad_norm": 0.008516514673829079,
1752
+ "learning_rate": 3.6133694670280035e-07,
1753
+ "loss": 0.0014,
1754
+ "step": 2210
1755
+ },
1756
+ {
1757
+ "epoch": 18.0,
1758
+ "step": 2214,
1759
+ "total_flos": 2.739521370098516e+18,
1760
+ "train_loss": 0.17171474754101115,
1761
+ "train_runtime": 1017.2571,
1762
+ "train_samples_per_second": 34.752,
1763
+ "train_steps_per_second": 2.176
1764
+ }
1765
+ ],
1766
+ "logging_steps": 10,
1767
+ "max_steps": 2214,
1768
+ "num_input_tokens_seen": 0,
1769
+ "num_train_epochs": 18,
1770
+ "save_steps": 100,
1771
+ "stateful_callbacks": {
1772
+ "TrainerControl": {
1773
+ "args": {
1774
+ "should_epoch_stop": false,
1775
+ "should_evaluate": false,
1776
+ "should_log": false,
1777
+ "should_save": true,
1778
+ "should_training_stop": false
1779
+ },
1780
+ "attributes": {}
1781
+ }
1782
+ },
1783
+ "total_flos": 2.739521370098516e+18,
1784
+ "train_batch_size": 16,
1785
+ "trial_name": null,
1786
+ "trial_params": null
1787
+ }