bigscience-bot
commited on
Commit
·
69acce6
1
Parent(s):
53ce2e7
new data
Browse files- logs/main_log.txt +92 -0
logs/main_log.txt
CHANGED
@@ -67273,3 +67273,95 @@ time (ms)
|
|
67273 |
time (ms)
|
67274 |
iteration 765/ 292968 | consumed samples: 1566720 | consumed tokens: 127303680 | elapsed time per iteration (ms): 76814.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67275 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67273 |
time (ms)
|
67274 |
iteration 765/ 292968 | consumed samples: 1566720 | consumed tokens: 127303680 | elapsed time per iteration (ms): 76814.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67275 |
time (ms)
|
67276 |
+
iteration 766/ 292968 | consumed samples: 1568768 | consumed tokens: 127516672 | elapsed time per iteration (ms): 77218.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67277 |
+
time (ms)
|
67278 |
+
iteration 767/ 292968 | consumed samples: 1570816 | consumed tokens: 127729664 | elapsed time per iteration (ms): 77724.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67279 |
+
time (ms)
|
67280 |
+
iteration 768/ 292968 | consumed samples: 1572864 | consumed tokens: 127942656 | elapsed time per iteration (ms): 79202.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67281 |
+
time (ms)
|
67282 |
+
iteration 769/ 292968 | consumed samples: 1574912 | consumed tokens: 128155648 | elapsed time per iteration (ms): 78713.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67283 |
+
time (ms)
|
67284 |
+
iteration 770/ 292968 | consumed samples: 1576960 | consumed tokens: 128368640 | elapsed time per iteration (ms): 78768.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67285 |
+
time (ms)
|
67286 |
+
iteration 771/ 292968 | consumed samples: 1579008 | consumed tokens: 128581632 | elapsed time per iteration (ms): 77027.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67287 |
+
time (ms)
|
67288 |
+
iteration 772/ 292968 | consumed samples: 1581056 | consumed tokens: 128794624 | elapsed time per iteration (ms): 77694.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67289 |
+
time (ms)
|
67290 |
+
iteration 773/ 292968 | consumed samples: 1583104 | consumed tokens: 129007616 | elapsed time per iteration (ms): 78285.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67291 |
+
time (ms)
|
67292 |
+
iteration 774/ 292968 | consumed samples: 1585152 | consumed tokens: 129220608 | elapsed time per iteration (ms): 77768.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67293 |
+
time (ms)
|
67294 |
+
iteration 775/ 292968 | consumed samples: 1587200 | consumed tokens: 129433600 | elapsed time per iteration (ms): 78751.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67295 |
+
time (ms)
|
67296 |
+
iteration 776/ 292968 | consumed samples: 1589248 | consumed tokens: 129646592 | elapsed time per iteration (ms): 78528.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67297 |
+
time (ms)
|
67298 |
+
iteration 777/ 292968 | consumed samples: 1591296 | consumed tokens: 129859584 | elapsed time per iteration (ms): 78682.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67299 |
+
time (ms)
|
67300 |
+
iteration 778/ 292968 | consumed samples: 1593344 | consumed tokens: 130072576 | elapsed time per iteration (ms): 77272.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67301 |
+
time (ms)
|
67302 |
+
iteration 779/ 292968 | consumed samples: 1595392 | consumed tokens: 130285568 | elapsed time per iteration (ms): 80038.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67303 |
+
time (ms)
|
67304 |
+
iteration 780/ 292968 | consumed samples: 1597440 | consumed tokens: 130498560 | elapsed time per iteration (ms): 77708.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67305 |
+
time (ms)
|
67306 |
+
iteration 781/ 292968 | consumed samples: 1599488 | consumed tokens: 130711552 | elapsed time per iteration (ms): 77785.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67307 |
+
time (ms)
|
67308 |
+
iteration 782/ 292968 | consumed samples: 1601536 | consumed tokens: 130924544 | elapsed time per iteration (ms): 77721.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67309 |
+
time (ms)
|
67310 |
+
iteration 783/ 292968 | consumed samples: 1603584 | consumed tokens: 131137536 | elapsed time per iteration (ms): 78420.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67311 |
+
time (ms)
|
67312 |
+
iteration 784/ 292968 | consumed samples: 1605632 | consumed tokens: 131350528 | elapsed time per iteration (ms): 78087.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67313 |
+
time (ms)
|
67314 |
+
iteration 785/ 292968 | consumed samples: 1607680 | consumed tokens: 131563520 | elapsed time per iteration (ms): 79958.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67315 |
+
time (ms)
|
67316 |
+
iteration 786/ 292968 | consumed samples: 1609728 | consumed tokens: 131776512 | elapsed time per iteration (ms): 78833.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67317 |
+
time (ms)
|
67318 |
+
iteration 787/ 292968 | consumed samples: 1611776 | consumed tokens: 131989504 | elapsed time per iteration (ms): 76965.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67319 |
+
time (ms)
|
67320 |
+
iteration 788/ 292968 | consumed samples: 1613824 | consumed tokens: 132202496 | elapsed time per iteration (ms): 77924.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67321 |
+
time (ms)
|
67322 |
+
iteration 789/ 292968 | consumed samples: 1615872 | consumed tokens: 132415488 | elapsed time per iteration (ms): 78840.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67323 |
+
time (ms)
|
67324 |
+
iteration 790/ 292968 | consumed samples: 1617920 | consumed tokens: 132628480 | elapsed time per iteration (ms): 77402.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67325 |
+
time (ms)
|
67326 |
+
iteration 791/ 292968 | consumed samples: 1619968 | consumed tokens: 132841472 | elapsed time per iteration (ms): 78261.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67327 |
+
time (ms)
|
67328 |
+
iteration 792/ 292968 | consumed samples: 1622016 | consumed tokens: 133054464 | elapsed time per iteration (ms): 80176.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67329 |
+
time (ms)
|
67330 |
+
iteration 793/ 292968 | consumed samples: 1624064 | consumed tokens: 133267456 | elapsed time per iteration (ms): 79974.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67331 |
+
time (ms)
|
67332 |
+
iteration 794/ 292968 | consumed samples: 1626112 | consumed tokens: 133480448 | elapsed time per iteration (ms): 77972.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67333 |
+
time (ms)
|
67334 |
+
iteration 795/ 292968 | consumed samples: 1628160 | consumed tokens: 133693440 | elapsed time per iteration (ms): 78413.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67335 |
+
time (ms)
|
67336 |
+
iteration 796/ 292968 | consumed samples: 1630208 | consumed tokens: 133906432 | elapsed time per iteration (ms): 79004.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67337 |
+
time (ms)
|
67338 |
+
iteration 797/ 292968 | consumed samples: 1632256 | consumed tokens: 134119424 | elapsed time per iteration (ms): 76848.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67339 |
+
time (ms)
|
67340 |
+
iteration 798/ 292968 | consumed samples: 1634304 | consumed tokens: 134332416 | elapsed time per iteration (ms): 78243.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67341 |
+
time (ms)
|
67342 |
+
iteration 799/ 292968 | consumed samples: 1636352 | consumed tokens: 134545408 | elapsed time per iteration (ms): 79156.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67343 |
+
time (ms)
|
67344 |
+
iteration 800/ 292968 | consumed samples: 1638400 | consumed tokens: 134758400 | elapsed time per iteration (ms): 77568.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67345 |
+
time (ms)
|
67346 |
+
iteration 801/ 292968 | consumed samples: 1640448 | consumed tokens: 134971392 | elapsed time per iteration (ms): 78323.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67347 |
+
time (ms)
|
67348 |
+
iteration 802/ 292968 | consumed samples: 1642496 | consumed tokens: 135184384 | elapsed time per iteration (ms): 78633.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67349 |
+
time (ms)
|
67350 |
+
iteration 803/ 292968 | consumed samples: 1644544 | consumed tokens: 135397376 | elapsed time per iteration (ms): 78813.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67351 |
+
time (ms)
|
67352 |
+
iteration 804/ 292968 | consumed samples: 1646592 | consumed tokens: 135610368 | elapsed time per iteration (ms): 78171.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67353 |
+
time (ms)
|
67354 |
+
iteration 805/ 292968 | consumed samples: 1648640 | consumed tokens: 135823360 | elapsed time per iteration (ms): 77535.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67355 |
+
time (ms)
|
67356 |
+
iteration 806/ 292968 | consumed samples: 1650688 | consumed tokens: 136036352 | elapsed time per iteration (ms): 76979.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67357 |
+
time (ms)
|
67358 |
+
iteration 807/ 292968 | consumed samples: 1652736 | consumed tokens: 136249344 | elapsed time per iteration (ms): 79204.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67359 |
+
time (ms)
|
67360 |
+
iteration 808/ 292968 | consumed samples: 1654784 | consumed tokens: 136462336 | elapsed time per iteration (ms): 77025.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67361 |
+
time (ms)
|
67362 |
+
iteration 809/ 292968 | consumed samples: 1656832 | consumed tokens: 136675328 | elapsed time per iteration (ms): 77032.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67363 |
+
time (ms)
|
67364 |
+
iteration 810/ 292968 | consumed samples: 1658880 | consumed tokens: 136888320 | elapsed time per iteration (ms): 78530.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67365 |
+
time (ms)
|
67366 |
+
iteration 811/ 292968 | consumed samples: 1660928 | consumed tokens: 137101312 | elapsed time per iteration (ms): 78796.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67367 |
+
time (ms)
|