afraid15chicken commited on
Commit
cee0969
·
verified ·
1 Parent(s): adb3b38

🍻 cheers

Browse files
README.md CHANGED
@@ -3,6 +3,7 @@ library_name: transformers
3
  license: apache-2.0
4
  base_model: google/vit-base-patch16-224-in21k
5
  tags:
 
6
  - generated_from_trainer
7
  datasets:
8
  - imagefolder
@@ -15,7 +16,7 @@ model-index:
15
  name: Image Classification
16
  type: image-classification
17
  dataset:
18
- name: imagefolder
19
  type: imagefolder
20
  config: default
21
  split: train
@@ -23,7 +24,7 @@ model-index:
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
- value: 0.9986902423051736
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -31,10 +32,10 @@ should probably proofread and complete it, then remove this comment. -->
31
 
32
  # finetuned-arsenic
33
 
34
- This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset.
35
  It achieves the following results on the evaluation set:
36
- - Loss: 0.0066
37
- - Accuracy: 0.9987
38
 
39
  ## Model description
40
 
 
3
  license: apache-2.0
4
  base_model: google/vit-base-patch16-224-in21k
5
  tags:
6
+ - image-classification
7
  - generated_from_trainer
8
  datasets:
9
  - imagefolder
 
16
  name: Image Classification
17
  type: image-classification
18
  dataset:
19
+ name: indian_food_images
20
  type: imagefolder
21
  config: default
22
  split: train
 
24
  metrics:
25
  - name: Accuracy
26
  type: accuracy
27
+ value: 0.9993451211525868
28
  ---
29
 
30
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
32
 
33
  # finetuned-arsenic
34
 
35
+ This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the indian_food_images dataset.
36
  It achieves the following results on the evaluation set:
37
+ - Loss: 0.0048
38
+ - Accuracy: 0.9993
39
 
40
  ## Model description
41
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "eval_accuracy": 0.9993451211525868,
4
+ "eval_loss": 0.0047513521276414394,
5
+ "eval_runtime": 53.3656,
6
+ "eval_samples_per_second": 28.614,
7
+ "eval_steps_per_second": 3.579,
8
+ "total_flos": 2.6818427765818e+18,
9
+ "train_loss": 0.0841421499820822,
10
+ "train_runtime": 2597.595,
11
+ "train_samples_per_second": 13.323,
12
+ "train_steps_per_second": 0.833
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "eval_accuracy": 0.9993451211525868,
4
+ "eval_loss": 0.0047513521276414394,
5
+ "eval_runtime": 53.3656,
6
+ "eval_samples_per_second": 28.614,
7
+ "eval_steps_per_second": 3.579
8
+ }
runs/Oct07_16-01-53_efd0d9aa04b4/events.out.tfevents.1728319587.efd0d9aa04b4.3229.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:981e50a825a3773a3d80bd9d1ea2d9197665fff9d6605b4505da276baf04d7d6
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "total_flos": 2.6818427765818e+18,
4
+ "train_loss": 0.0841421499820822,
5
+ "train_runtime": 2597.595,
6
+ "train_samples_per_second": 13.323,
7
+ "train_steps_per_second": 0.833
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1743 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.0047513521276414394,
3
+ "best_model_checkpoint": "finetuned-arsenic/checkpoint-2000",
4
+ "epoch": 4.0,
5
+ "eval_steps": 100,
6
+ "global_step": 2164,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.018484288354898338,
13
+ "grad_norm": 4.949392795562744,
14
+ "learning_rate": 0.0001990757855822551,
15
+ "loss": 0.5368,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.036968576709796676,
20
+ "grad_norm": 3.3969953060150146,
21
+ "learning_rate": 0.00019815157116451017,
22
+ "loss": 0.3313,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.05545286506469501,
27
+ "grad_norm": 0.859575629234314,
28
+ "learning_rate": 0.00019722735674676528,
29
+ "loss": 0.5003,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.07393715341959335,
34
+ "grad_norm": 5.522923946380615,
35
+ "learning_rate": 0.00019630314232902034,
36
+ "loss": 0.2564,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.09242144177449169,
41
+ "grad_norm": 4.462332248687744,
42
+ "learning_rate": 0.00019537892791127544,
43
+ "loss": 0.3339,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.11090573012939002,
48
+ "grad_norm": 1.6224160194396973,
49
+ "learning_rate": 0.0001944547134935305,
50
+ "loss": 0.3965,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.12939001848428835,
55
+ "grad_norm": 6.097796440124512,
56
+ "learning_rate": 0.0001935304990757856,
57
+ "loss": 0.3319,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.1478743068391867,
62
+ "grad_norm": 3.9769697189331055,
63
+ "learning_rate": 0.00019260628465804066,
64
+ "loss": 0.4012,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.16635859519408502,
69
+ "grad_norm": 2.335510730743408,
70
+ "learning_rate": 0.00019168207024029577,
71
+ "loss": 0.4584,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.18484288354898337,
76
+ "grad_norm": 3.8701980113983154,
77
+ "learning_rate": 0.00019075785582255082,
78
+ "loss": 0.1855,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.18484288354898337,
83
+ "eval_accuracy": 0.931237721021611,
84
+ "eval_loss": 0.1917603313922882,
85
+ "eval_runtime": 57.9367,
86
+ "eval_samples_per_second": 26.356,
87
+ "eval_steps_per_second": 3.297,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.2033271719038817,
92
+ "grad_norm": 2.2155282497406006,
93
+ "learning_rate": 0.00018983364140480593,
94
+ "loss": 0.2331,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.22181146025878004,
99
+ "grad_norm": 0.9634373188018799,
100
+ "learning_rate": 0.000188909426987061,
101
+ "loss": 0.209,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 0.24029574861367836,
106
+ "grad_norm": 0.2715567648410797,
107
+ "learning_rate": 0.0001879852125693161,
108
+ "loss": 0.1486,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 0.2587800369685767,
113
+ "grad_norm": 12.090089797973633,
114
+ "learning_rate": 0.00018706099815157118,
115
+ "loss": 0.1629,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 0.27726432532347506,
120
+ "grad_norm": 1.551562786102295,
121
+ "learning_rate": 0.00018613678373382626,
122
+ "loss": 0.1852,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 0.2957486136783734,
127
+ "grad_norm": 0.775977373123169,
128
+ "learning_rate": 0.00018521256931608134,
129
+ "loss": 0.3179,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 0.3142329020332717,
134
+ "grad_norm": 3.0043396949768066,
135
+ "learning_rate": 0.00018428835489833642,
136
+ "loss": 0.3842,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 0.33271719038817005,
141
+ "grad_norm": 1.2949095964431763,
142
+ "learning_rate": 0.0001833641404805915,
143
+ "loss": 0.2534,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 0.3512014787430684,
148
+ "grad_norm": 9.545828819274902,
149
+ "learning_rate": 0.00018243992606284658,
150
+ "loss": 0.2031,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 0.36968576709796674,
155
+ "grad_norm": 0.29387930035591125,
156
+ "learning_rate": 0.0001815157116451017,
157
+ "loss": 0.1792,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.36968576709796674,
162
+ "eval_accuracy": 0.9364767518009168,
163
+ "eval_loss": 0.17399875819683075,
164
+ "eval_runtime": 52.9831,
165
+ "eval_samples_per_second": 28.821,
166
+ "eval_steps_per_second": 3.605,
167
+ "step": 200
168
+ },
169
+ {
170
+ "epoch": 0.38817005545286504,
171
+ "grad_norm": 2.138578414916992,
172
+ "learning_rate": 0.00018059149722735675,
173
+ "loss": 0.2129,
174
+ "step": 210
175
+ },
176
+ {
177
+ "epoch": 0.4066543438077634,
178
+ "grad_norm": 2.022083282470703,
179
+ "learning_rate": 0.00017966728280961186,
180
+ "loss": 0.1577,
181
+ "step": 220
182
+ },
183
+ {
184
+ "epoch": 0.42513863216266173,
185
+ "grad_norm": 2.8811872005462646,
186
+ "learning_rate": 0.0001787430683918669,
187
+ "loss": 0.21,
188
+ "step": 230
189
+ },
190
+ {
191
+ "epoch": 0.4436229205175601,
192
+ "grad_norm": 1.491790771484375,
193
+ "learning_rate": 0.00017781885397412202,
194
+ "loss": 0.2498,
195
+ "step": 240
196
+ },
197
+ {
198
+ "epoch": 0.46210720887245843,
199
+ "grad_norm": 2.5274643898010254,
200
+ "learning_rate": 0.00017689463955637707,
201
+ "loss": 0.149,
202
+ "step": 250
203
+ },
204
+ {
205
+ "epoch": 0.4805914972273567,
206
+ "grad_norm": 0.6268563270568848,
207
+ "learning_rate": 0.00017597042513863218,
208
+ "loss": 0.1306,
209
+ "step": 260
210
+ },
211
+ {
212
+ "epoch": 0.49907578558225507,
213
+ "grad_norm": 6.4418511390686035,
214
+ "learning_rate": 0.00017504621072088724,
215
+ "loss": 0.1889,
216
+ "step": 270
217
+ },
218
+ {
219
+ "epoch": 0.5175600739371534,
220
+ "grad_norm": 0.13176225125789642,
221
+ "learning_rate": 0.00017412199630314234,
222
+ "loss": 0.1304,
223
+ "step": 280
224
+ },
225
+ {
226
+ "epoch": 0.5360443622920518,
227
+ "grad_norm": 1.4023276567459106,
228
+ "learning_rate": 0.00017319778188539743,
229
+ "loss": 0.0872,
230
+ "step": 290
231
+ },
232
+ {
233
+ "epoch": 0.5545286506469501,
234
+ "grad_norm": 5.165181636810303,
235
+ "learning_rate": 0.0001722735674676525,
236
+ "loss": 0.1688,
237
+ "step": 300
238
+ },
239
+ {
240
+ "epoch": 0.5545286506469501,
241
+ "eval_accuracy": 0.9692206941715783,
242
+ "eval_loss": 0.078226737678051,
243
+ "eval_runtime": 52.9719,
244
+ "eval_samples_per_second": 28.827,
245
+ "eval_steps_per_second": 3.606,
246
+ "step": 300
247
+ },
248
+ {
249
+ "epoch": 0.5730129390018485,
250
+ "grad_norm": 4.743193626403809,
251
+ "learning_rate": 0.0001713493530499076,
252
+ "loss": 0.1222,
253
+ "step": 310
254
+ },
255
+ {
256
+ "epoch": 0.5914972273567468,
257
+ "grad_norm": 3.3770973682403564,
258
+ "learning_rate": 0.00017042513863216267,
259
+ "loss": 0.2799,
260
+ "step": 320
261
+ },
262
+ {
263
+ "epoch": 0.609981515711645,
264
+ "grad_norm": 1.9085370302200317,
265
+ "learning_rate": 0.00016950092421441775,
266
+ "loss": 0.1779,
267
+ "step": 330
268
+ },
269
+ {
270
+ "epoch": 0.6284658040665434,
271
+ "grad_norm": 2.592458963394165,
272
+ "learning_rate": 0.00016857670979667283,
273
+ "loss": 0.1619,
274
+ "step": 340
275
+ },
276
+ {
277
+ "epoch": 0.6469500924214417,
278
+ "grad_norm": 1.1735055446624756,
279
+ "learning_rate": 0.00016765249537892791,
280
+ "loss": 0.4249,
281
+ "step": 350
282
+ },
283
+ {
284
+ "epoch": 0.6654343807763401,
285
+ "grad_norm": 3.8289904594421387,
286
+ "learning_rate": 0.000166728280961183,
287
+ "loss": 0.1009,
288
+ "step": 360
289
+ },
290
+ {
291
+ "epoch": 0.6839186691312384,
292
+ "grad_norm": 2.531283378601074,
293
+ "learning_rate": 0.00016580406654343808,
294
+ "loss": 0.1494,
295
+ "step": 370
296
+ },
297
+ {
298
+ "epoch": 0.7024029574861368,
299
+ "grad_norm": 0.21572425961494446,
300
+ "learning_rate": 0.00016487985212569316,
301
+ "loss": 0.0824,
302
+ "step": 380
303
+ },
304
+ {
305
+ "epoch": 0.7208872458410351,
306
+ "grad_norm": 3.6041758060455322,
307
+ "learning_rate": 0.00016395563770794827,
308
+ "loss": 0.1145,
309
+ "step": 390
310
+ },
311
+ {
312
+ "epoch": 0.7393715341959335,
313
+ "grad_norm": 0.6018674969673157,
314
+ "learning_rate": 0.00016303142329020332,
315
+ "loss": 0.1238,
316
+ "step": 400
317
+ },
318
+ {
319
+ "epoch": 0.7393715341959335,
320
+ "eval_accuracy": 0.922724296005239,
321
+ "eval_loss": 0.21575002372264862,
322
+ "eval_runtime": 52.6224,
323
+ "eval_samples_per_second": 29.018,
324
+ "eval_steps_per_second": 3.63,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 0.7578558225508318,
329
+ "grad_norm": 0.25093191862106323,
330
+ "learning_rate": 0.00016210720887245843,
331
+ "loss": 0.0724,
332
+ "step": 410
333
+ },
334
+ {
335
+ "epoch": 0.7763401109057301,
336
+ "grad_norm": 0.2480381280183792,
337
+ "learning_rate": 0.00016118299445471348,
338
+ "loss": 0.106,
339
+ "step": 420
340
+ },
341
+ {
342
+ "epoch": 0.7948243992606284,
343
+ "grad_norm": 8.212138175964355,
344
+ "learning_rate": 0.0001602587800369686,
345
+ "loss": 0.1665,
346
+ "step": 430
347
+ },
348
+ {
349
+ "epoch": 0.8133086876155268,
350
+ "grad_norm": 0.6615661382675171,
351
+ "learning_rate": 0.00015933456561922367,
352
+ "loss": 0.0547,
353
+ "step": 440
354
+ },
355
+ {
356
+ "epoch": 0.8317929759704251,
357
+ "grad_norm": 4.98212194442749,
358
+ "learning_rate": 0.00015841035120147876,
359
+ "loss": 0.1982,
360
+ "step": 450
361
+ },
362
+ {
363
+ "epoch": 0.8502772643253235,
364
+ "grad_norm": 1.7662006616592407,
365
+ "learning_rate": 0.00015748613678373384,
366
+ "loss": 0.1402,
367
+ "step": 460
368
+ },
369
+ {
370
+ "epoch": 0.8687615526802218,
371
+ "grad_norm": 5.664543151855469,
372
+ "learning_rate": 0.00015656192236598892,
373
+ "loss": 0.1606,
374
+ "step": 470
375
+ },
376
+ {
377
+ "epoch": 0.8872458410351202,
378
+ "grad_norm": 5.662344932556152,
379
+ "learning_rate": 0.000155637707948244,
380
+ "loss": 0.0869,
381
+ "step": 480
382
+ },
383
+ {
384
+ "epoch": 0.9057301293900185,
385
+ "grad_norm": 1.1777679920196533,
386
+ "learning_rate": 0.00015471349353049908,
387
+ "loss": 0.0827,
388
+ "step": 490
389
+ },
390
+ {
391
+ "epoch": 0.9242144177449169,
392
+ "grad_norm": 0.06051797419786453,
393
+ "learning_rate": 0.00015378927911275416,
394
+ "loss": 0.0969,
395
+ "step": 500
396
+ },
397
+ {
398
+ "epoch": 0.9242144177449169,
399
+ "eval_accuracy": 0.9842829076620825,
400
+ "eval_loss": 0.04485374689102173,
401
+ "eval_runtime": 52.5355,
402
+ "eval_samples_per_second": 29.066,
403
+ "eval_steps_per_second": 3.636,
404
+ "step": 500
405
+ },
406
+ {
407
+ "epoch": 0.9426987060998152,
408
+ "grad_norm": 9.434717178344727,
409
+ "learning_rate": 0.00015286506469500925,
410
+ "loss": 0.1921,
411
+ "step": 510
412
+ },
413
+ {
414
+ "epoch": 0.9611829944547134,
415
+ "grad_norm": 1.619040846824646,
416
+ "learning_rate": 0.00015194085027726433,
417
+ "loss": 0.1906,
418
+ "step": 520
419
+ },
420
+ {
421
+ "epoch": 0.9796672828096118,
422
+ "grad_norm": 0.5532277226448059,
423
+ "learning_rate": 0.0001510166358595194,
424
+ "loss": 0.1082,
425
+ "step": 530
426
+ },
427
+ {
428
+ "epoch": 0.9981515711645101,
429
+ "grad_norm": 0.0866900086402893,
430
+ "learning_rate": 0.0001500924214417745,
431
+ "loss": 0.1119,
432
+ "step": 540
433
+ },
434
+ {
435
+ "epoch": 1.0166358595194085,
436
+ "grad_norm": 2.668076276779175,
437
+ "learning_rate": 0.00014916820702402957,
438
+ "loss": 0.143,
439
+ "step": 550
440
+ },
441
+ {
442
+ "epoch": 1.0351201478743068,
443
+ "grad_norm": 0.15896956622600555,
444
+ "learning_rate": 0.00014824399260628468,
445
+ "loss": 0.0378,
446
+ "step": 560
447
+ },
448
+ {
449
+ "epoch": 1.0536044362292052,
450
+ "grad_norm": 0.12053361535072327,
451
+ "learning_rate": 0.00014731977818853976,
452
+ "loss": 0.0528,
453
+ "step": 570
454
+ },
455
+ {
456
+ "epoch": 1.0720887245841035,
457
+ "grad_norm": 0.06896385550498962,
458
+ "learning_rate": 0.00014639556377079484,
459
+ "loss": 0.1663,
460
+ "step": 580
461
+ },
462
+ {
463
+ "epoch": 1.0905730129390019,
464
+ "grad_norm": 7.400400638580322,
465
+ "learning_rate": 0.00014547134935304992,
466
+ "loss": 0.081,
467
+ "step": 590
468
+ },
469
+ {
470
+ "epoch": 1.1090573012939002,
471
+ "grad_norm": 0.04029673710465431,
472
+ "learning_rate": 0.000144547134935305,
473
+ "loss": 0.0326,
474
+ "step": 600
475
+ },
476
+ {
477
+ "epoch": 1.1090573012939002,
478
+ "eval_accuracy": 0.9574328749181401,
479
+ "eval_loss": 0.1554253250360489,
480
+ "eval_runtime": 52.4665,
481
+ "eval_samples_per_second": 29.104,
482
+ "eval_steps_per_second": 3.64,
483
+ "step": 600
484
+ },
485
+ {
486
+ "epoch": 1.1275415896487986,
487
+ "grad_norm": 1.2735309600830078,
488
+ "learning_rate": 0.0001436229205175601,
489
+ "loss": 0.1339,
490
+ "step": 610
491
+ },
492
+ {
493
+ "epoch": 1.146025878003697,
494
+ "grad_norm": 2.2266452312469482,
495
+ "learning_rate": 0.00014269870609981517,
496
+ "loss": 0.1443,
497
+ "step": 620
498
+ },
499
+ {
500
+ "epoch": 1.1645101663585953,
501
+ "grad_norm": 2.932450294494629,
502
+ "learning_rate": 0.00014177449168207025,
503
+ "loss": 0.0869,
504
+ "step": 630
505
+ },
506
+ {
507
+ "epoch": 1.1829944547134936,
508
+ "grad_norm": 5.688024520874023,
509
+ "learning_rate": 0.00014085027726432533,
510
+ "loss": 0.091,
511
+ "step": 640
512
+ },
513
+ {
514
+ "epoch": 1.201478743068392,
515
+ "grad_norm": 0.04643339663743973,
516
+ "learning_rate": 0.0001399260628465804,
517
+ "loss": 0.0433,
518
+ "step": 650
519
+ },
520
+ {
521
+ "epoch": 1.21996303142329,
522
+ "grad_norm": 0.38614460825920105,
523
+ "learning_rate": 0.0001390018484288355,
524
+ "loss": 0.0514,
525
+ "step": 660
526
+ },
527
+ {
528
+ "epoch": 1.2384473197781884,
529
+ "grad_norm": 0.03372357785701752,
530
+ "learning_rate": 0.00013807763401109058,
531
+ "loss": 0.0826,
532
+ "step": 670
533
+ },
534
+ {
535
+ "epoch": 1.2569316081330868,
536
+ "grad_norm": 0.7059990763664246,
537
+ "learning_rate": 0.00013715341959334566,
538
+ "loss": 0.1309,
539
+ "step": 680
540
+ },
541
+ {
542
+ "epoch": 1.2754158964879851,
543
+ "grad_norm": 1.5385607481002808,
544
+ "learning_rate": 0.00013622920517560074,
545
+ "loss": 0.115,
546
+ "step": 690
547
+ },
548
+ {
549
+ "epoch": 1.2939001848428835,
550
+ "grad_norm": 1.647644281387329,
551
+ "learning_rate": 0.00013530499075785582,
552
+ "loss": 0.1057,
553
+ "step": 700
554
+ },
555
+ {
556
+ "epoch": 1.2939001848428835,
557
+ "eval_accuracy": 0.9738048461034708,
558
+ "eval_loss": 0.08448445796966553,
559
+ "eval_runtime": 52.7705,
560
+ "eval_samples_per_second": 28.937,
561
+ "eval_steps_per_second": 3.619,
562
+ "step": 700
563
+ },
564
+ {
565
+ "epoch": 1.3123844731977818,
566
+ "grad_norm": 0.8896564841270447,
567
+ "learning_rate": 0.0001343807763401109,
568
+ "loss": 0.1076,
569
+ "step": 710
570
+ },
571
+ {
572
+ "epoch": 1.3308687615526802,
573
+ "grad_norm": 0.9722292423248291,
574
+ "learning_rate": 0.000133456561922366,
575
+ "loss": 0.1285,
576
+ "step": 720
577
+ },
578
+ {
579
+ "epoch": 1.3493530499075785,
580
+ "grad_norm": 3.9030041694641113,
581
+ "learning_rate": 0.00013253234750462106,
582
+ "loss": 0.1367,
583
+ "step": 730
584
+ },
585
+ {
586
+ "epoch": 1.3678373382624769,
587
+ "grad_norm": 1.199768304824829,
588
+ "learning_rate": 0.00013160813308687617,
589
+ "loss": 0.088,
590
+ "step": 740
591
+ },
592
+ {
593
+ "epoch": 1.3863216266173752,
594
+ "grad_norm": 0.8339413404464722,
595
+ "learning_rate": 0.00013068391866913125,
596
+ "loss": 0.0481,
597
+ "step": 750
598
+ },
599
+ {
600
+ "epoch": 1.4048059149722736,
601
+ "grad_norm": 2.3673453330993652,
602
+ "learning_rate": 0.00012975970425138634,
603
+ "loss": 0.0698,
604
+ "step": 760
605
+ },
606
+ {
607
+ "epoch": 1.423290203327172,
608
+ "grad_norm": 0.042785417288541794,
609
+ "learning_rate": 0.00012883548983364142,
610
+ "loss": 0.0179,
611
+ "step": 770
612
+ },
613
+ {
614
+ "epoch": 1.4417744916820703,
615
+ "grad_norm": 2.720048189163208,
616
+ "learning_rate": 0.0001279112754158965,
617
+ "loss": 0.0996,
618
+ "step": 780
619
+ },
620
+ {
621
+ "epoch": 1.4602587800369686,
622
+ "grad_norm": 16.840740203857422,
623
+ "learning_rate": 0.00012698706099815158,
624
+ "loss": 0.0707,
625
+ "step": 790
626
+ },
627
+ {
628
+ "epoch": 1.478743068391867,
629
+ "grad_norm": 0.1579107642173767,
630
+ "learning_rate": 0.00012606284658040666,
631
+ "loss": 0.0805,
632
+ "step": 800
633
+ },
634
+ {
635
+ "epoch": 1.478743068391867,
636
+ "eval_accuracy": 0.9823182711198428,
637
+ "eval_loss": 0.07117750495672226,
638
+ "eval_runtime": 53.0346,
639
+ "eval_samples_per_second": 28.793,
640
+ "eval_steps_per_second": 3.601,
641
+ "step": 800
642
+ },
643
+ {
644
+ "epoch": 1.4972273567467653,
645
+ "grad_norm": 7.252885341644287,
646
+ "learning_rate": 0.00012513863216266174,
647
+ "loss": 0.0848,
648
+ "step": 810
649
+ },
650
+ {
651
+ "epoch": 1.5157116451016637,
652
+ "grad_norm": 0.25338369607925415,
653
+ "learning_rate": 0.00012421441774491682,
654
+ "loss": 0.0689,
655
+ "step": 820
656
+ },
657
+ {
658
+ "epoch": 1.5341959334565618,
659
+ "grad_norm": 3.66860032081604,
660
+ "learning_rate": 0.0001232902033271719,
661
+ "loss": 0.041,
662
+ "step": 830
663
+ },
664
+ {
665
+ "epoch": 1.5526802218114604,
666
+ "grad_norm": 9.176445960998535,
667
+ "learning_rate": 0.000122365988909427,
668
+ "loss": 0.111,
669
+ "step": 840
670
+ },
671
+ {
672
+ "epoch": 1.5711645101663585,
673
+ "grad_norm": 0.032652150839567184,
674
+ "learning_rate": 0.00012144177449168208,
675
+ "loss": 0.0519,
676
+ "step": 850
677
+ },
678
+ {
679
+ "epoch": 1.589648798521257,
680
+ "grad_norm": 0.054165273904800415,
681
+ "learning_rate": 0.00012051756007393715,
682
+ "loss": 0.0661,
683
+ "step": 860
684
+ },
685
+ {
686
+ "epoch": 1.6081330868761552,
687
+ "grad_norm": 0.10612482577562332,
688
+ "learning_rate": 0.00011959334565619225,
689
+ "loss": 0.0157,
690
+ "step": 870
691
+ },
692
+ {
693
+ "epoch": 1.6266173752310538,
694
+ "grad_norm": 0.7138892412185669,
695
+ "learning_rate": 0.00011866913123844731,
696
+ "loss": 0.1159,
697
+ "step": 880
698
+ },
699
+ {
700
+ "epoch": 1.645101663585952,
701
+ "grad_norm": 0.0576617456972599,
702
+ "learning_rate": 0.00011774491682070241,
703
+ "loss": 0.1059,
704
+ "step": 890
705
+ },
706
+ {
707
+ "epoch": 1.6635859519408502,
708
+ "grad_norm": 2.485743999481201,
709
+ "learning_rate": 0.00011682070240295748,
710
+ "loss": 0.0889,
711
+ "step": 900
712
+ },
713
+ {
714
+ "epoch": 1.6635859519408502,
715
+ "eval_accuracy": 0.9796987557301899,
716
+ "eval_loss": 0.07181376963853836,
717
+ "eval_runtime": 53.7952,
718
+ "eval_samples_per_second": 28.385,
719
+ "eval_steps_per_second": 3.551,
720
+ "step": 900
721
+ },
722
+ {
723
+ "epoch": 1.6820702402957486,
724
+ "grad_norm": 0.25389525294303894,
725
+ "learning_rate": 0.00011589648798521257,
726
+ "loss": 0.0478,
727
+ "step": 910
728
+ },
729
+ {
730
+ "epoch": 1.700554528650647,
731
+ "grad_norm": 0.040639039129018784,
732
+ "learning_rate": 0.00011497227356746765,
733
+ "loss": 0.0579,
734
+ "step": 920
735
+ },
736
+ {
737
+ "epoch": 1.7190388170055453,
738
+ "grad_norm": 0.04252118989825249,
739
+ "learning_rate": 0.00011404805914972275,
740
+ "loss": 0.0414,
741
+ "step": 930
742
+ },
743
+ {
744
+ "epoch": 1.7375231053604436,
745
+ "grad_norm": 0.03039310872554779,
746
+ "learning_rate": 0.00011312384473197783,
747
+ "loss": 0.1247,
748
+ "step": 940
749
+ },
750
+ {
751
+ "epoch": 1.756007393715342,
752
+ "grad_norm": 0.04092634469270706,
753
+ "learning_rate": 0.00011219963031423291,
754
+ "loss": 0.0485,
755
+ "step": 950
756
+ },
757
+ {
758
+ "epoch": 1.7744916820702403,
759
+ "grad_norm": 0.02784869633615017,
760
+ "learning_rate": 0.000111275415896488,
761
+ "loss": 0.044,
762
+ "step": 960
763
+ },
764
+ {
765
+ "epoch": 1.7929759704251387,
766
+ "grad_norm": 0.6377788186073303,
767
+ "learning_rate": 0.00011035120147874307,
768
+ "loss": 0.0833,
769
+ "step": 970
770
+ },
771
+ {
772
+ "epoch": 1.8114602587800368,
773
+ "grad_norm": 0.0410403273999691,
774
+ "learning_rate": 0.00010942698706099817,
775
+ "loss": 0.0079,
776
+ "step": 980
777
+ },
778
+ {
779
+ "epoch": 1.8299445471349354,
780
+ "grad_norm": 0.16617639362812042,
781
+ "learning_rate": 0.00010850277264325324,
782
+ "loss": 0.0562,
783
+ "step": 990
784
+ },
785
+ {
786
+ "epoch": 1.8484288354898335,
787
+ "grad_norm": 6.131214141845703,
788
+ "learning_rate": 0.00010757855822550833,
789
+ "loss": 0.0503,
790
+ "step": 1000
791
+ },
792
+ {
793
+ "epoch": 1.8484288354898335,
794
+ "eval_accuracy": 0.9934512115258677,
795
+ "eval_loss": 0.0250676441937685,
796
+ "eval_runtime": 53.2731,
797
+ "eval_samples_per_second": 28.664,
798
+ "eval_steps_per_second": 3.585,
799
+ "step": 1000
800
+ },
801
+ {
802
+ "epoch": 1.866913123844732,
803
+ "grad_norm": 0.07335863262414932,
804
+ "learning_rate": 0.0001066543438077634,
805
+ "loss": 0.0444,
806
+ "step": 1010
807
+ },
808
+ {
809
+ "epoch": 1.8853974121996302,
810
+ "grad_norm": 0.034475117921829224,
811
+ "learning_rate": 0.0001057301293900185,
812
+ "loss": 0.0513,
813
+ "step": 1020
814
+ },
815
+ {
816
+ "epoch": 1.9038817005545288,
817
+ "grad_norm": 0.035967420786619186,
818
+ "learning_rate": 0.00010480591497227356,
819
+ "loss": 0.0669,
820
+ "step": 1030
821
+ },
822
+ {
823
+ "epoch": 1.922365988909427,
824
+ "grad_norm": 0.029034554958343506,
825
+ "learning_rate": 0.00010388170055452866,
826
+ "loss": 0.0278,
827
+ "step": 1040
828
+ },
829
+ {
830
+ "epoch": 1.9408502772643255,
831
+ "grad_norm": 3.698307514190674,
832
+ "learning_rate": 0.00010295748613678373,
833
+ "loss": 0.0547,
834
+ "step": 1050
835
+ },
836
+ {
837
+ "epoch": 1.9593345656192236,
838
+ "grad_norm": 0.040026549249887466,
839
+ "learning_rate": 0.00010203327171903882,
840
+ "loss": 0.0065,
841
+ "step": 1060
842
+ },
843
+ {
844
+ "epoch": 1.9778188539741222,
845
+ "grad_norm": 3.3067240715026855,
846
+ "learning_rate": 0.0001011090573012939,
847
+ "loss": 0.0828,
848
+ "step": 1070
849
+ },
850
+ {
851
+ "epoch": 1.9963031423290203,
852
+ "grad_norm": 0.05000556632876396,
853
+ "learning_rate": 0.000100184842883549,
854
+ "loss": 0.0632,
855
+ "step": 1080
856
+ },
857
+ {
858
+ "epoch": 2.014787430683919,
859
+ "grad_norm": 0.04542790353298187,
860
+ "learning_rate": 9.926062846580408e-05,
861
+ "loss": 0.0682,
862
+ "step": 1090
863
+ },
864
+ {
865
+ "epoch": 2.033271719038817,
866
+ "grad_norm": 0.030154038220643997,
867
+ "learning_rate": 9.833641404805916e-05,
868
+ "loss": 0.0225,
869
+ "step": 1100
870
+ },
871
+ {
872
+ "epoch": 2.033271719038817,
873
+ "eval_accuracy": 0.9967256057629339,
874
+ "eval_loss": 0.01773611083626747,
875
+ "eval_runtime": 52.5689,
876
+ "eval_samples_per_second": 29.048,
877
+ "eval_steps_per_second": 3.633,
878
+ "step": 1100
879
+ },
880
+ {
881
+ "epoch": 2.0517560073937156,
882
+ "grad_norm": 0.3824068307876587,
883
+ "learning_rate": 9.741219963031424e-05,
884
+ "loss": 0.0194,
885
+ "step": 1110
886
+ },
887
+ {
888
+ "epoch": 2.0702402957486137,
889
+ "grad_norm": 0.020000776275992393,
890
+ "learning_rate": 9.648798521256932e-05,
891
+ "loss": 0.0259,
892
+ "step": 1120
893
+ },
894
+ {
895
+ "epoch": 2.088724584103512,
896
+ "grad_norm": 3.488415241241455,
897
+ "learning_rate": 9.55637707948244e-05,
898
+ "loss": 0.0629,
899
+ "step": 1130
900
+ },
901
+ {
902
+ "epoch": 2.1072088724584104,
903
+ "grad_norm": 10.373331069946289,
904
+ "learning_rate": 9.463955637707949e-05,
905
+ "loss": 0.015,
906
+ "step": 1140
907
+ },
908
+ {
909
+ "epoch": 2.1256931608133085,
910
+ "grad_norm": 0.23100066184997559,
911
+ "learning_rate": 9.371534195933457e-05,
912
+ "loss": 0.0619,
913
+ "step": 1150
914
+ },
915
+ {
916
+ "epoch": 2.144177449168207,
917
+ "grad_norm": 0.07692666351795197,
918
+ "learning_rate": 9.279112754158965e-05,
919
+ "loss": 0.06,
920
+ "step": 1160
921
+ },
922
+ {
923
+ "epoch": 2.162661737523105,
924
+ "grad_norm": 0.057554759085178375,
925
+ "learning_rate": 9.186691312384473e-05,
926
+ "loss": 0.0079,
927
+ "step": 1170
928
+ },
929
+ {
930
+ "epoch": 2.1811460258780038,
931
+ "grad_norm": 0.039722565561532974,
932
+ "learning_rate": 9.094269870609981e-05,
933
+ "loss": 0.0581,
934
+ "step": 1180
935
+ },
936
+ {
937
+ "epoch": 2.199630314232902,
938
+ "grad_norm": 0.021510232239961624,
939
+ "learning_rate": 9.001848428835489e-05,
940
+ "loss": 0.0052,
941
+ "step": 1190
942
+ },
943
+ {
944
+ "epoch": 2.2181146025878005,
945
+ "grad_norm": 0.019746674224734306,
946
+ "learning_rate": 8.909426987060999e-05,
947
+ "loss": 0.0049,
948
+ "step": 1200
949
+ },
950
+ {
951
+ "epoch": 2.2181146025878005,
952
+ "eval_accuracy": 0.9921414538310412,
953
+ "eval_loss": 0.024552814662456512,
954
+ "eval_runtime": 52.686,
955
+ "eval_samples_per_second": 28.983,
956
+ "eval_steps_per_second": 3.625,
957
+ "step": 1200
958
+ },
959
+ {
960
+ "epoch": 2.2365988909426986,
961
+ "grad_norm": 4.809552192687988,
962
+ "learning_rate": 8.817005545286507e-05,
963
+ "loss": 0.098,
964
+ "step": 1210
965
+ },
966
+ {
967
+ "epoch": 2.255083179297597,
968
+ "grad_norm": 0.22049099206924438,
969
+ "learning_rate": 8.724584103512015e-05,
970
+ "loss": 0.1328,
971
+ "step": 1220
972
+ },
973
+ {
974
+ "epoch": 2.2735674676524953,
975
+ "grad_norm": 0.02430686727166176,
976
+ "learning_rate": 8.632162661737525e-05,
977
+ "loss": 0.0332,
978
+ "step": 1230
979
+ },
980
+ {
981
+ "epoch": 2.292051756007394,
982
+ "grad_norm": 0.16566839814186096,
983
+ "learning_rate": 8.539741219963033e-05,
984
+ "loss": 0.0242,
985
+ "step": 1240
986
+ },
987
+ {
988
+ "epoch": 2.310536044362292,
989
+ "grad_norm": 0.07895852625370026,
990
+ "learning_rate": 8.447319778188541e-05,
991
+ "loss": 0.0394,
992
+ "step": 1250
993
+ },
994
+ {
995
+ "epoch": 2.3290203327171906,
996
+ "grad_norm": 0.01941494271159172,
997
+ "learning_rate": 8.354898336414049e-05,
998
+ "loss": 0.0373,
999
+ "step": 1260
1000
+ },
1001
+ {
1002
+ "epoch": 2.3475046210720887,
1003
+ "grad_norm": 0.018574291840195656,
1004
+ "learning_rate": 8.262476894639557e-05,
1005
+ "loss": 0.0582,
1006
+ "step": 1270
1007
+ },
1008
+ {
1009
+ "epoch": 2.3659889094269873,
1010
+ "grad_norm": 9.006904602050781,
1011
+ "learning_rate": 8.170055452865065e-05,
1012
+ "loss": 0.075,
1013
+ "step": 1280
1014
+ },
1015
+ {
1016
+ "epoch": 2.3844731977818854,
1017
+ "grad_norm": 0.5771515965461731,
1018
+ "learning_rate": 8.077634011090573e-05,
1019
+ "loss": 0.0217,
1020
+ "step": 1290
1021
+ },
1022
+ {
1023
+ "epoch": 2.402957486136784,
1024
+ "grad_norm": 0.01840708591043949,
1025
+ "learning_rate": 7.985212569316082e-05,
1026
+ "loss": 0.0152,
1027
+ "step": 1300
1028
+ },
1029
+ {
1030
+ "epoch": 2.402957486136784,
1031
+ "eval_accuracy": 0.9986902423051736,
1032
+ "eval_loss": 0.008291647769510746,
1033
+ "eval_runtime": 53.4499,
1034
+ "eval_samples_per_second": 28.569,
1035
+ "eval_steps_per_second": 3.573,
1036
+ "step": 1300
1037
+ },
1038
+ {
1039
+ "epoch": 2.421441774491682,
1040
+ "grad_norm": 0.017435792833566666,
1041
+ "learning_rate": 7.89279112754159e-05,
1042
+ "loss": 0.0448,
1043
+ "step": 1310
1044
+ },
1045
+ {
1046
+ "epoch": 2.43992606284658,
1047
+ "grad_norm": 0.7729086875915527,
1048
+ "learning_rate": 7.800369685767098e-05,
1049
+ "loss": 0.0444,
1050
+ "step": 1320
1051
+ },
1052
+ {
1053
+ "epoch": 2.4584103512014788,
1054
+ "grad_norm": 0.059264715760946274,
1055
+ "learning_rate": 7.707948243992606e-05,
1056
+ "loss": 0.0397,
1057
+ "step": 1330
1058
+ },
1059
+ {
1060
+ "epoch": 2.476894639556377,
1061
+ "grad_norm": 0.024057278409600258,
1062
+ "learning_rate": 7.615526802218114e-05,
1063
+ "loss": 0.028,
1064
+ "step": 1340
1065
+ },
1066
+ {
1067
+ "epoch": 2.4953789279112755,
1068
+ "grad_norm": 0.022951899096369743,
1069
+ "learning_rate": 7.523105360443624e-05,
1070
+ "loss": 0.0444,
1071
+ "step": 1350
1072
+ },
1073
+ {
1074
+ "epoch": 2.5138632162661736,
1075
+ "grad_norm": 0.021782563999295235,
1076
+ "learning_rate": 7.430683918669132e-05,
1077
+ "loss": 0.0385,
1078
+ "step": 1360
1079
+ },
1080
+ {
1081
+ "epoch": 2.532347504621072,
1082
+ "grad_norm": 0.1371038258075714,
1083
+ "learning_rate": 7.33826247689464e-05,
1084
+ "loss": 0.0188,
1085
+ "step": 1370
1086
+ },
1087
+ {
1088
+ "epoch": 2.5508317929759703,
1089
+ "grad_norm": 0.7299683690071106,
1090
+ "learning_rate": 7.245841035120148e-05,
1091
+ "loss": 0.0845,
1092
+ "step": 1380
1093
+ },
1094
+ {
1095
+ "epoch": 2.569316081330869,
1096
+ "grad_norm": 0.34656259417533875,
1097
+ "learning_rate": 7.153419593345656e-05,
1098
+ "loss": 0.0436,
1099
+ "step": 1390
1100
+ },
1101
+ {
1102
+ "epoch": 2.587800369685767,
1103
+ "grad_norm": 0.10165718197822571,
1104
+ "learning_rate": 7.060998151571166e-05,
1105
+ "loss": 0.08,
1106
+ "step": 1400
1107
+ },
1108
+ {
1109
+ "epoch": 2.587800369685767,
1110
+ "eval_accuracy": 0.9941060903732809,
1111
+ "eval_loss": 0.021378275007009506,
1112
+ "eval_runtime": 52.8132,
1113
+ "eval_samples_per_second": 28.913,
1114
+ "eval_steps_per_second": 3.617,
1115
+ "step": 1400
1116
+ },
1117
+ {
1118
+ "epoch": 2.6062846580406656,
1119
+ "grad_norm": 5.586907863616943,
1120
+ "learning_rate": 6.968576709796674e-05,
1121
+ "loss": 0.0295,
1122
+ "step": 1410
1123
+ },
1124
+ {
1125
+ "epoch": 2.6247689463955637,
1126
+ "grad_norm": 0.0221896730363369,
1127
+ "learning_rate": 6.876155268022182e-05,
1128
+ "loss": 0.0627,
1129
+ "step": 1420
1130
+ },
1131
+ {
1132
+ "epoch": 2.6432532347504623,
1133
+ "grad_norm": 0.30416977405548096,
1134
+ "learning_rate": 6.78373382624769e-05,
1135
+ "loss": 0.0035,
1136
+ "step": 1430
1137
+ },
1138
+ {
1139
+ "epoch": 2.6617375231053604,
1140
+ "grad_norm": 0.102454274892807,
1141
+ "learning_rate": 6.691312384473198e-05,
1142
+ "loss": 0.0641,
1143
+ "step": 1440
1144
+ },
1145
+ {
1146
+ "epoch": 2.6802218114602585,
1147
+ "grad_norm": 0.023131974041461945,
1148
+ "learning_rate": 6.598890942698706e-05,
1149
+ "loss": 0.0326,
1150
+ "step": 1450
1151
+ },
1152
+ {
1153
+ "epoch": 2.698706099815157,
1154
+ "grad_norm": 0.09067076444625854,
1155
+ "learning_rate": 6.506469500924215e-05,
1156
+ "loss": 0.017,
1157
+ "step": 1460
1158
+ },
1159
+ {
1160
+ "epoch": 2.7171903881700556,
1161
+ "grad_norm": 3.3906850814819336,
1162
+ "learning_rate": 6.414048059149723e-05,
1163
+ "loss": 0.029,
1164
+ "step": 1470
1165
+ },
1166
+ {
1167
+ "epoch": 2.7356746765249538,
1168
+ "grad_norm": 0.061337146908044815,
1169
+ "learning_rate": 6.321626617375231e-05,
1170
+ "loss": 0.0168,
1171
+ "step": 1480
1172
+ },
1173
+ {
1174
+ "epoch": 2.754158964879852,
1175
+ "grad_norm": 0.19621238112449646,
1176
+ "learning_rate": 6.229205175600739e-05,
1177
+ "loss": 0.006,
1178
+ "step": 1490
1179
+ },
1180
+ {
1181
+ "epoch": 2.7726432532347505,
1182
+ "grad_norm": 0.012029612436890602,
1183
+ "learning_rate": 6.136783733826249e-05,
1184
+ "loss": 0.0043,
1185
+ "step": 1500
1186
+ },
1187
+ {
1188
+ "epoch": 2.7726432532347505,
1189
+ "eval_accuracy": 0.9980353634577603,
1190
+ "eval_loss": 0.006946724373847246,
1191
+ "eval_runtime": 52.203,
1192
+ "eval_samples_per_second": 29.251,
1193
+ "eval_steps_per_second": 3.659,
1194
+ "step": 1500
1195
+ },
1196
+ {
1197
+ "epoch": 2.791127541589649,
1198
+ "grad_norm": 0.014309920370578766,
1199
+ "learning_rate": 6.044362292051756e-05,
1200
+ "loss": 0.0074,
1201
+ "step": 1510
1202
+ },
1203
+ {
1204
+ "epoch": 2.809611829944547,
1205
+ "grad_norm": 3.063054323196411,
1206
+ "learning_rate": 5.951940850277264e-05,
1207
+ "loss": 0.0045,
1208
+ "step": 1520
1209
+ },
1210
+ {
1211
+ "epoch": 2.8280961182994453,
1212
+ "grad_norm": 0.011617097072303295,
1213
+ "learning_rate": 5.859519408502773e-05,
1214
+ "loss": 0.0525,
1215
+ "step": 1530
1216
+ },
1217
+ {
1218
+ "epoch": 2.846580406654344,
1219
+ "grad_norm": 5.252607345581055,
1220
+ "learning_rate": 5.767097966728281e-05,
1221
+ "loss": 0.0104,
1222
+ "step": 1540
1223
+ },
1224
+ {
1225
+ "epoch": 2.865064695009242,
1226
+ "grad_norm": 0.014846362173557281,
1227
+ "learning_rate": 5.674676524953789e-05,
1228
+ "loss": 0.0265,
1229
+ "step": 1550
1230
+ },
1231
+ {
1232
+ "epoch": 2.8835489833641406,
1233
+ "grad_norm": 0.011737200431525707,
1234
+ "learning_rate": 5.5822550831792974e-05,
1235
+ "loss": 0.0543,
1236
+ "step": 1560
1237
+ },
1238
+ {
1239
+ "epoch": 2.9020332717190387,
1240
+ "grad_norm": 0.012772896327078342,
1241
+ "learning_rate": 5.4898336414048056e-05,
1242
+ "loss": 0.0018,
1243
+ "step": 1570
1244
+ },
1245
+ {
1246
+ "epoch": 2.9205175600739373,
1247
+ "grad_norm": 0.06962817162275314,
1248
+ "learning_rate": 5.397412199630314e-05,
1249
+ "loss": 0.0234,
1250
+ "step": 1580
1251
+ },
1252
+ {
1253
+ "epoch": 2.9390018484288354,
1254
+ "grad_norm": 0.019341696053743362,
1255
+ "learning_rate": 5.304990757855823e-05,
1256
+ "loss": 0.105,
1257
+ "step": 1590
1258
+ },
1259
+ {
1260
+ "epoch": 2.957486136783734,
1261
+ "grad_norm": 4.673314571380615,
1262
+ "learning_rate": 5.2125693160813314e-05,
1263
+ "loss": 0.0501,
1264
+ "step": 1600
1265
+ },
1266
+ {
1267
+ "epoch": 2.957486136783734,
1268
+ "eval_accuracy": 0.9967256057629339,
1269
+ "eval_loss": 0.015068226493895054,
1270
+ "eval_runtime": 51.6353,
1271
+ "eval_samples_per_second": 29.573,
1272
+ "eval_steps_per_second": 3.699,
1273
+ "step": 1600
1274
+ },
1275
+ {
1276
+ "epoch": 2.975970425138632,
1277
+ "grad_norm": 0.018514908850193024,
1278
+ "learning_rate": 5.1201478743068395e-05,
1279
+ "loss": 0.0312,
1280
+ "step": 1610
1281
+ },
1282
+ {
1283
+ "epoch": 2.9944547134935307,
1284
+ "grad_norm": 0.0645008459687233,
1285
+ "learning_rate": 5.027726432532348e-05,
1286
+ "loss": 0.0489,
1287
+ "step": 1620
1288
+ },
1289
+ {
1290
+ "epoch": 3.0129390018484288,
1291
+ "grad_norm": 0.017880817875266075,
1292
+ "learning_rate": 4.935304990757856e-05,
1293
+ "loss": 0.0366,
1294
+ "step": 1630
1295
+ },
1296
+ {
1297
+ "epoch": 3.0314232902033273,
1298
+ "grad_norm": 0.04122663289308548,
1299
+ "learning_rate": 4.8428835489833646e-05,
1300
+ "loss": 0.0539,
1301
+ "step": 1640
1302
+ },
1303
+ {
1304
+ "epoch": 3.0499075785582255,
1305
+ "grad_norm": 0.022179430350661278,
1306
+ "learning_rate": 4.750462107208873e-05,
1307
+ "loss": 0.0248,
1308
+ "step": 1650
1309
+ },
1310
+ {
1311
+ "epoch": 3.068391866913124,
1312
+ "grad_norm": 0.924117386341095,
1313
+ "learning_rate": 4.658040665434381e-05,
1314
+ "loss": 0.02,
1315
+ "step": 1660
1316
+ },
1317
+ {
1318
+ "epoch": 3.086876155268022,
1319
+ "grad_norm": 0.01614381931722164,
1320
+ "learning_rate": 4.565619223659889e-05,
1321
+ "loss": 0.023,
1322
+ "step": 1670
1323
+ },
1324
+ {
1325
+ "epoch": 3.1053604436229207,
1326
+ "grad_norm": 0.05051511153578758,
1327
+ "learning_rate": 4.473197781885398e-05,
1328
+ "loss": 0.0041,
1329
+ "step": 1680
1330
+ },
1331
+ {
1332
+ "epoch": 3.123844731977819,
1333
+ "grad_norm": 0.02787856012582779,
1334
+ "learning_rate": 4.380776340110906e-05,
1335
+ "loss": 0.0163,
1336
+ "step": 1690
1337
+ },
1338
+ {
1339
+ "epoch": 3.142329020332717,
1340
+ "grad_norm": 0.21667926013469696,
1341
+ "learning_rate": 4.288354898336414e-05,
1342
+ "loss": 0.0186,
1343
+ "step": 1700
1344
+ },
1345
+ {
1346
+ "epoch": 3.142329020332717,
1347
+ "eval_accuracy": 0.9973804846103471,
1348
+ "eval_loss": 0.007818276062607765,
1349
+ "eval_runtime": 52.8582,
1350
+ "eval_samples_per_second": 28.889,
1351
+ "eval_steps_per_second": 3.613,
1352
+ "step": 1700
1353
+ },
1354
+ {
1355
+ "epoch": 3.1608133086876156,
1356
+ "grad_norm": 0.02714550867676735,
1357
+ "learning_rate": 4.195933456561922e-05,
1358
+ "loss": 0.0178,
1359
+ "step": 1710
1360
+ },
1361
+ {
1362
+ "epoch": 3.1792975970425137,
1363
+ "grad_norm": 0.5191987156867981,
1364
+ "learning_rate": 4.1035120147874305e-05,
1365
+ "loss": 0.0582,
1366
+ "step": 1720
1367
+ },
1368
+ {
1369
+ "epoch": 3.1977818853974123,
1370
+ "grad_norm": 0.02666807919740677,
1371
+ "learning_rate": 4.011090573012939e-05,
1372
+ "loss": 0.007,
1373
+ "step": 1730
1374
+ },
1375
+ {
1376
+ "epoch": 3.2162661737523104,
1377
+ "grad_norm": 0.06601597368717194,
1378
+ "learning_rate": 3.9186691312384474e-05,
1379
+ "loss": 0.0477,
1380
+ "step": 1740
1381
+ },
1382
+ {
1383
+ "epoch": 3.234750462107209,
1384
+ "grad_norm": 0.0280216746032238,
1385
+ "learning_rate": 3.826247689463956e-05,
1386
+ "loss": 0.0048,
1387
+ "step": 1750
1388
+ },
1389
+ {
1390
+ "epoch": 3.253234750462107,
1391
+ "grad_norm": 4.720592021942139,
1392
+ "learning_rate": 3.7338262476894644e-05,
1393
+ "loss": 0.0186,
1394
+ "step": 1760
1395
+ },
1396
+ {
1397
+ "epoch": 3.2717190388170057,
1398
+ "grad_norm": 0.01574169471859932,
1399
+ "learning_rate": 3.6414048059149726e-05,
1400
+ "loss": 0.0017,
1401
+ "step": 1770
1402
+ },
1403
+ {
1404
+ "epoch": 3.290203327171904,
1405
+ "grad_norm": 0.02533087506890297,
1406
+ "learning_rate": 3.548983364140481e-05,
1407
+ "loss": 0.0025,
1408
+ "step": 1780
1409
+ },
1410
+ {
1411
+ "epoch": 3.3086876155268024,
1412
+ "grad_norm": 0.013142619282007217,
1413
+ "learning_rate": 3.456561922365989e-05,
1414
+ "loss": 0.0376,
1415
+ "step": 1790
1416
+ },
1417
+ {
1418
+ "epoch": 3.3271719038817005,
1419
+ "grad_norm": 0.07316397875547409,
1420
+ "learning_rate": 3.364140480591497e-05,
1421
+ "loss": 0.0033,
1422
+ "step": 1800
1423
+ },
1424
+ {
1425
+ "epoch": 3.3271719038817005,
1426
+ "eval_accuracy": 0.9960707269155207,
1427
+ "eval_loss": 0.013949541375041008,
1428
+ "eval_runtime": 53.0604,
1429
+ "eval_samples_per_second": 28.779,
1430
+ "eval_steps_per_second": 3.6,
1431
+ "step": 1800
1432
+ },
1433
+ {
1434
+ "epoch": 3.345656192236599,
1435
+ "grad_norm": 0.015296310186386108,
1436
+ "learning_rate": 3.271719038817006e-05,
1437
+ "loss": 0.0015,
1438
+ "step": 1810
1439
+ },
1440
+ {
1441
+ "epoch": 3.364140480591497,
1442
+ "grad_norm": 5.960048198699951,
1443
+ "learning_rate": 3.179297597042514e-05,
1444
+ "loss": 0.0222,
1445
+ "step": 1820
1446
+ },
1447
+ {
1448
+ "epoch": 3.3826247689463957,
1449
+ "grad_norm": 0.21616186201572418,
1450
+ "learning_rate": 3.086876155268023e-05,
1451
+ "loss": 0.0038,
1452
+ "step": 1830
1453
+ },
1454
+ {
1455
+ "epoch": 3.401109057301294,
1456
+ "grad_norm": 0.015051410533487797,
1457
+ "learning_rate": 2.994454713493531e-05,
1458
+ "loss": 0.0019,
1459
+ "step": 1840
1460
+ },
1461
+ {
1462
+ "epoch": 3.4195933456561924,
1463
+ "grad_norm": 13.381204605102539,
1464
+ "learning_rate": 2.902033271719039e-05,
1465
+ "loss": 0.0182,
1466
+ "step": 1850
1467
+ },
1468
+ {
1469
+ "epoch": 3.4380776340110906,
1470
+ "grad_norm": 0.1726062297821045,
1471
+ "learning_rate": 2.8096118299445472e-05,
1472
+ "loss": 0.0022,
1473
+ "step": 1860
1474
+ },
1475
+ {
1476
+ "epoch": 3.4565619223659887,
1477
+ "grad_norm": 0.01701999455690384,
1478
+ "learning_rate": 2.7171903881700557e-05,
1479
+ "loss": 0.0014,
1480
+ "step": 1870
1481
+ },
1482
+ {
1483
+ "epoch": 3.4750462107208873,
1484
+ "grad_norm": 0.013869056478142738,
1485
+ "learning_rate": 2.624768946395564e-05,
1486
+ "loss": 0.0013,
1487
+ "step": 1880
1488
+ },
1489
+ {
1490
+ "epoch": 3.4935304990757854,
1491
+ "grad_norm": 0.021621432155370712,
1492
+ "learning_rate": 2.532347504621072e-05,
1493
+ "loss": 0.0016,
1494
+ "step": 1890
1495
+ },
1496
+ {
1497
+ "epoch": 3.512014787430684,
1498
+ "grad_norm": 1.3106377124786377,
1499
+ "learning_rate": 2.4399260628465805e-05,
1500
+ "loss": 0.0023,
1501
+ "step": 1900
1502
+ },
1503
+ {
1504
+ "epoch": 3.512014787430684,
1505
+ "eval_accuracy": 0.9986902423051736,
1506
+ "eval_loss": 0.0075506423600018024,
1507
+ "eval_runtime": 50.8135,
1508
+ "eval_samples_per_second": 30.051,
1509
+ "eval_steps_per_second": 3.759,
1510
+ "step": 1900
1511
+ },
1512
+ {
1513
+ "epoch": 3.530499075785582,
1514
+ "grad_norm": 0.01985827274620533,
1515
+ "learning_rate": 2.347504621072089e-05,
1516
+ "loss": 0.0016,
1517
+ "step": 1910
1518
+ },
1519
+ {
1520
+ "epoch": 3.5489833641404807,
1521
+ "grad_norm": 0.013897390104830265,
1522
+ "learning_rate": 2.255083179297597e-05,
1523
+ "loss": 0.0308,
1524
+ "step": 1920
1525
+ },
1526
+ {
1527
+ "epoch": 3.567467652495379,
1528
+ "grad_norm": 0.009370139800012112,
1529
+ "learning_rate": 2.1626617375231053e-05,
1530
+ "loss": 0.0123,
1531
+ "step": 1930
1532
+ },
1533
+ {
1534
+ "epoch": 3.5859519408502774,
1535
+ "grad_norm": 0.019544150680303574,
1536
+ "learning_rate": 2.0702402957486137e-05,
1537
+ "loss": 0.0257,
1538
+ "step": 1940
1539
+ },
1540
+ {
1541
+ "epoch": 3.6044362292051755,
1542
+ "grad_norm": 0.018746808171272278,
1543
+ "learning_rate": 1.9778188539741222e-05,
1544
+ "loss": 0.03,
1545
+ "step": 1950
1546
+ },
1547
+ {
1548
+ "epoch": 3.622920517560074,
1549
+ "grad_norm": 0.009196238592267036,
1550
+ "learning_rate": 1.8853974121996304e-05,
1551
+ "loss": 0.0011,
1552
+ "step": 1960
1553
+ },
1554
+ {
1555
+ "epoch": 3.641404805914972,
1556
+ "grad_norm": 0.011442320421338081,
1557
+ "learning_rate": 1.7929759704251385e-05,
1558
+ "loss": 0.0012,
1559
+ "step": 1970
1560
+ },
1561
+ {
1562
+ "epoch": 3.6598890942698707,
1563
+ "grad_norm": 0.010710498318076134,
1564
+ "learning_rate": 1.700554528650647e-05,
1565
+ "loss": 0.0019,
1566
+ "step": 1980
1567
+ },
1568
+ {
1569
+ "epoch": 3.678373382624769,
1570
+ "grad_norm": 0.06102241575717926,
1571
+ "learning_rate": 1.6081330868761555e-05,
1572
+ "loss": 0.0012,
1573
+ "step": 1990
1574
+ },
1575
+ {
1576
+ "epoch": 3.6968576709796674,
1577
+ "grad_norm": 0.008612744510173798,
1578
+ "learning_rate": 1.5157116451016636e-05,
1579
+ "loss": 0.0054,
1580
+ "step": 2000
1581
+ },
1582
+ {
1583
+ "epoch": 3.6968576709796674,
1584
+ "eval_accuracy": 0.9993451211525868,
1585
+ "eval_loss": 0.0047513521276414394,
1586
+ "eval_runtime": 52.2618,
1587
+ "eval_samples_per_second": 29.218,
1588
+ "eval_steps_per_second": 3.655,
1589
+ "step": 2000
1590
+ },
1591
+ {
1592
+ "epoch": 3.7153419593345656,
1593
+ "grad_norm": 0.008234468288719654,
1594
+ "learning_rate": 1.423290203327172e-05,
1595
+ "loss": 0.043,
1596
+ "step": 2010
1597
+ },
1598
+ {
1599
+ "epoch": 3.733826247689464,
1600
+ "grad_norm": 0.008917649276554585,
1601
+ "learning_rate": 1.3308687615526803e-05,
1602
+ "loss": 0.0384,
1603
+ "step": 2020
1604
+ },
1605
+ {
1606
+ "epoch": 3.7523105360443623,
1607
+ "grad_norm": 0.00844865757972002,
1608
+ "learning_rate": 1.2384473197781886e-05,
1609
+ "loss": 0.0013,
1610
+ "step": 2030
1611
+ },
1612
+ {
1613
+ "epoch": 3.7707948243992604,
1614
+ "grad_norm": 0.008531128987669945,
1615
+ "learning_rate": 1.1460258780036969e-05,
1616
+ "loss": 0.0195,
1617
+ "step": 2040
1618
+ },
1619
+ {
1620
+ "epoch": 3.789279112754159,
1621
+ "grad_norm": 0.009270643815398216,
1622
+ "learning_rate": 1.0536044362292052e-05,
1623
+ "loss": 0.0392,
1624
+ "step": 2050
1625
+ },
1626
+ {
1627
+ "epoch": 3.8077634011090575,
1628
+ "grad_norm": 0.009245671331882477,
1629
+ "learning_rate": 9.611829944547135e-06,
1630
+ "loss": 0.0011,
1631
+ "step": 2060
1632
+ },
1633
+ {
1634
+ "epoch": 3.8262476894639557,
1635
+ "grad_norm": 0.01690092496573925,
1636
+ "learning_rate": 8.687615526802218e-06,
1637
+ "loss": 0.0016,
1638
+ "step": 2070
1639
+ },
1640
+ {
1641
+ "epoch": 3.844731977818854,
1642
+ "grad_norm": 0.015731679275631905,
1643
+ "learning_rate": 7.763401109057302e-06,
1644
+ "loss": 0.0317,
1645
+ "step": 2080
1646
+ },
1647
+ {
1648
+ "epoch": 3.8632162661737524,
1649
+ "grad_norm": 3.0953285694122314,
1650
+ "learning_rate": 6.931608133086876e-06,
1651
+ "loss": 0.0454,
1652
+ "step": 2090
1653
+ },
1654
+ {
1655
+ "epoch": 3.8817005545286505,
1656
+ "grad_norm": 6.279654502868652,
1657
+ "learning_rate": 6.0073937153419595e-06,
1658
+ "loss": 0.0168,
1659
+ "step": 2100
1660
+ },
1661
+ {
1662
+ "epoch": 3.8817005545286505,
1663
+ "eval_accuracy": 0.9986902423051736,
1664
+ "eval_loss": 0.006641203537583351,
1665
+ "eval_runtime": 52.9204,
1666
+ "eval_samples_per_second": 28.855,
1667
+ "eval_steps_per_second": 3.609,
1668
+ "step": 2100
1669
+ },
1670
+ {
1671
+ "epoch": 3.900184842883549,
1672
+ "grad_norm": 0.009602474048733711,
1673
+ "learning_rate": 5.083179297597043e-06,
1674
+ "loss": 0.0011,
1675
+ "step": 2110
1676
+ },
1677
+ {
1678
+ "epoch": 3.918669131238447,
1679
+ "grad_norm": 12.240010261535645,
1680
+ "learning_rate": 4.158964879852126e-06,
1681
+ "loss": 0.0236,
1682
+ "step": 2120
1683
+ },
1684
+ {
1685
+ "epoch": 3.9371534195933457,
1686
+ "grad_norm": 0.03988449275493622,
1687
+ "learning_rate": 3.234750462107209e-06,
1688
+ "loss": 0.0014,
1689
+ "step": 2130
1690
+ },
1691
+ {
1692
+ "epoch": 3.955637707948244,
1693
+ "grad_norm": 5.554378986358643,
1694
+ "learning_rate": 2.310536044362292e-06,
1695
+ "loss": 0.0041,
1696
+ "step": 2140
1697
+ },
1698
+ {
1699
+ "epoch": 3.9741219963031424,
1700
+ "grad_norm": 0.0083112558349967,
1701
+ "learning_rate": 1.3863216266173753e-06,
1702
+ "loss": 0.02,
1703
+ "step": 2150
1704
+ },
1705
+ {
1706
+ "epoch": 3.9926062846580406,
1707
+ "grad_norm": 2.2959258556365967,
1708
+ "learning_rate": 4.621072088724585e-07,
1709
+ "loss": 0.0053,
1710
+ "step": 2160
1711
+ },
1712
+ {
1713
+ "epoch": 4.0,
1714
+ "step": 2164,
1715
+ "total_flos": 2.6818427765818e+18,
1716
+ "train_loss": 0.0841421499820822,
1717
+ "train_runtime": 2597.595,
1718
+ "train_samples_per_second": 13.323,
1719
+ "train_steps_per_second": 0.833
1720
+ }
1721
+ ],
1722
+ "logging_steps": 10,
1723
+ "max_steps": 2164,
1724
+ "num_input_tokens_seen": 0,
1725
+ "num_train_epochs": 4,
1726
+ "save_steps": 100,
1727
+ "stateful_callbacks": {
1728
+ "TrainerControl": {
1729
+ "args": {
1730
+ "should_epoch_stop": false,
1731
+ "should_evaluate": false,
1732
+ "should_log": false,
1733
+ "should_save": true,
1734
+ "should_training_stop": true
1735
+ },
1736
+ "attributes": {}
1737
+ }
1738
+ },
1739
+ "total_flos": 2.6818427765818e+18,
1740
+ "train_batch_size": 16,
1741
+ "trial_name": null,
1742
+ "trial_params": null
1743
+ }