bigscience-bot
commited on
Commit
·
6511b94
1
Parent(s):
ed982cb
new data
Browse files- logs/main_log.txt +63 -0
logs/main_log.txt
CHANGED
@@ -116300,3 +116300,66 @@ time (ms)
|
|
116300 |
time (ms)
|
116301 |
iteration 3140/ 292968 | consumed samples: 6430720 | consumed tokens: 942702592 | elapsed time per iteration (ms): 109945.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.550672E+00 | loss scale: 131072.0 | grad norm: 55870.403 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116302 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116300 |
time (ms)
|
116301 |
iteration 3140/ 292968 | consumed samples: 6430720 | consumed tokens: 942702592 | elapsed time per iteration (ms): 109945.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.550672E+00 | loss scale: 131072.0 | grad norm: 55870.403 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116302 |
time (ms)
|
116303 |
+
iteration 3141/ 292968 | consumed samples: 6432768 | consumed tokens: 943177728 | elapsed time per iteration (ms): 111833.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.535420E+00 | loss scale: 131072.0 | grad norm: 54687.584 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116304 |
+
time (ms)
|
116305 |
+
iteration 3142/ 292968 | consumed samples: 6434816 | consumed tokens: 943652864 | elapsed time per iteration (ms): 109935.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.554422E+00 | loss scale: 131072.0 | grad norm: 46354.847 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116306 |
+
time (ms)
|
116307 |
+
iteration 3143/ 292968 | consumed samples: 6436864 | consumed tokens: 944128000 | elapsed time per iteration (ms): 110450.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.515105E+00 | loss scale: 131072.0 | grad norm: 42457.256 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116308 |
+
time (ms)
|
116309 |
+
iteration 3144/ 292968 | consumed samples: 6438912 | consumed tokens: 944603136 | elapsed time per iteration (ms): 110392.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.491606E+00 | loss scale: 131072.0 | grad norm: 47675.537 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116310 |
+
time (ms)
|
116311 |
+
iteration 3145/ 292968 | consumed samples: 6440960 | consumed tokens: 945078272 | elapsed time per iteration (ms): 110165.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.545086E+00 | loss scale: 131072.0 | grad norm: 40437.099 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116312 |
+
time (ms)
|
116313 |
+
iteration 3146/ 292968 | consumed samples: 6443008 | consumed tokens: 945553408 | elapsed time per iteration (ms): 109112.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.519203E+00 | loss scale: 131072.0 | grad norm: 40121.803 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116314 |
+
time (ms)
|
116315 |
+
iteration 3147/ 292968 | consumed samples: 6445056 | consumed tokens: 946028544 | elapsed time per iteration (ms): 109992.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.507916E+00 | loss scale: 131072.0 | grad norm: 39602.549 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116316 |
+
time (ms)
|
116317 |
+
iteration 3148/ 292968 | consumed samples: 6447104 | consumed tokens: 946503680 | elapsed time per iteration (ms): 110837.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.501790E+00 | loss scale: 131072.0 | grad norm: 37185.032 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116318 |
+
time (ms)
|
116319 |
+
iteration 3149/ 292968 | consumed samples: 6449152 | consumed tokens: 946978816 | elapsed time per iteration (ms): 109989.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.528901E+00 | loss scale: 131072.0 | grad norm: 44056.823 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116320 |
+
time (ms)
|
116321 |
+
iteration 3150/ 292968 | consumed samples: 6451200 | consumed tokens: 947453952 | elapsed time per iteration (ms): 110689.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.500736E+00 | loss scale: 131072.0 | grad norm: 34733.114 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116322 |
+
time (ms)
|
116323 |
+
------------------------------------------------------------------------------------------------
|
116324 |
+
validation loss at iteration 3150 | lm loss value: 3.517273E+00 | lm loss PPL: 3.369244E+01 |
|
116325 |
+
------------------------------------------------------------------------------------------------
|
116326 |
+
iteration 3151/ 292968 | consumed samples: 6453248 | consumed tokens: 947929088 | elapsed time per iteration (ms): 289717.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.513340E+00 | loss scale: 131072.0 | grad norm: 35613.642 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116327 |
+
time (ms)
|
116328 |
+
iteration 3152/ 292968 | consumed samples: 6455296 | consumed tokens: 948404224 | elapsed time per iteration (ms): 109415.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.519228E+00 | loss scale: 131072.0 | grad norm: 46331.769 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116329 |
+
time (ms)
|
116330 |
+
iteration 3153/ 292968 | consumed samples: 6457344 | consumed tokens: 948879360 | elapsed time per iteration (ms): 108618.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.528655E+00 | loss scale: 131072.0 | grad norm: 62191.264 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116331 |
+
time (ms)
|
116332 |
+
iteration 3154/ 292968 | consumed samples: 6459392 | consumed tokens: 949354496 | elapsed time per iteration (ms): 109050.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.531178E+00 | loss scale: 131072.0 | grad norm: 55588.878 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116333 |
+
time (ms)
|
116334 |
+
iteration 3155/ 292968 | consumed samples: 6461440 | consumed tokens: 949829632 | elapsed time per iteration (ms): 111657.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.522779E+00 | loss scale: 131072.0 | grad norm: 44837.393 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116335 |
+
time (ms)
|
116336 |
+
iteration 3156/ 292968 | consumed samples: 6463488 | consumed tokens: 950304768 | elapsed time per iteration (ms): 110189.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.523057E+00 | loss scale: 131072.0 | grad norm: 43731.420 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116337 |
+
time (ms)
|
116338 |
+
iteration 3157/ 292968 | consumed samples: 6465536 | consumed tokens: 950779904 | elapsed time per iteration (ms): 110493.5 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.496690E+00 | loss scale: 131072.0 | grad norm: 46192.470 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116339 |
+
time (ms)
|
116340 |
+
iteration 3158/ 292968 | consumed samples: 6467584 | consumed tokens: 951255040 | elapsed time per iteration (ms): 109909.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.517199E+00 | loss scale: 131072.0 | grad norm: 31717.912 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116341 |
+
time (ms)
|
116342 |
+
iteration 3159/ 292968 | consumed samples: 6469632 | consumed tokens: 951730176 | elapsed time per iteration (ms): 110040.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.518413E+00 | loss scale: 131072.0 | grad norm: 40340.483 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116343 |
+
time (ms)
|
116344 |
+
iteration 3160/ 292968 | consumed samples: 6471680 | consumed tokens: 952205312 | elapsed time per iteration (ms): 111087.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.519091E+00 | loss scale: 131072.0 | grad norm: 32898.784 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116345 |
+
time (ms)
|
116346 |
+
iteration 3161/ 292968 | consumed samples: 6473728 | consumed tokens: 952680448 | elapsed time per iteration (ms): 109338.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.527358E+00 | loss scale: 131072.0 | grad norm: 34774.966 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116347 |
+
time (ms)
|
116348 |
+
iteration 3162/ 292968 | consumed samples: 6475776 | consumed tokens: 953155584 | elapsed time per iteration (ms): 108656.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.513849E+00 | loss scale: 131072.0 | grad norm: 39540.117 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116349 |
+
time (ms)
|
116350 |
+
iteration 3163/ 292968 | consumed samples: 6477824 | consumed tokens: 953630720 | elapsed time per iteration (ms): 109547.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.511124E+00 | loss scale: 131072.0 | grad norm: 48375.830 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116351 |
+
time (ms)
|
116352 |
+
iteration 3164/ 292968 | consumed samples: 6479872 | consumed tokens: 954105856 | elapsed time per iteration (ms): 113586.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.508611E+00 | loss scale: 131072.0 | grad norm: 52037.682 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116353 |
+
time (ms)
|
116354 |
+
iteration 3165/ 292968 | consumed samples: 6481920 | consumed tokens: 954580992 | elapsed time per iteration (ms): 114860.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.541578E+00 | loss scale: 131072.0 | grad norm: 41480.973 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116355 |
+
time (ms)
|
116356 |
+
iteration 3166/ 292968 | consumed samples: 6483968 | consumed tokens: 955056128 | elapsed time per iteration (ms): 121137.3 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.516208E+00 | loss scale: 131072.0 | grad norm: 41301.397 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116357 |
+
time (ms)
|
116358 |
+
iteration 3167/ 292968 | consumed samples: 6486016 | consumed tokens: 955531264 | elapsed time per iteration (ms): 110110.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.504046E+00 | loss scale: 131072.0 | grad norm: 47013.136 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116359 |
+
time (ms)
|
116360 |
+
iteration 3168/ 292968 | consumed samples: 6488064 | consumed tokens: 956006400 | elapsed time per iteration (ms): 110799.4 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.523125E+00 | loss scale: 131072.0 | grad norm: 53442.123 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116361 |
+
time (ms)
|
116362 |
+
iteration 3169/ 292968 | consumed samples: 6490112 | consumed tokens: 956481536 | elapsed time per iteration (ms): 109797.0 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.518640E+00 | loss scale: 131072.0 | grad norm: 44658.960 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116363 |
+
time (ms)
|
116364 |
+
iteration 3170/ 292968 | consumed samples: 6492160 | consumed tokens: 956956672 | elapsed time per iteration (ms): 109397.8 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.506108E+00 | loss scale: 131072.0 | grad norm: 37584.401 | num zeros: 0.0 | curriculum seqlen: 232 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
116365 |
+
time (ms)
|