bigscience-bot
commited on
Commit
·
d0a856e
1
Parent(s):
9322457
new data
Browse files- logs/main_log.txt +92 -0
logs/main_log.txt
CHANGED
@@ -67365,3 +67365,95 @@ time (ms)
|
|
67365 |
time (ms)
|
67366 |
iteration 811/ 292968 | consumed samples: 1660928 | consumed tokens: 137101312 | elapsed time per iteration (ms): 78796.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67367 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67365 |
time (ms)
|
67366 |
iteration 811/ 292968 | consumed samples: 1660928 | consumed tokens: 137101312 | elapsed time per iteration (ms): 78796.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67367 |
time (ms)
|
67368 |
+
iteration 812/ 292968 | consumed samples: 1662976 | consumed tokens: 137314304 | elapsed time per iteration (ms): 76478.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67369 |
+
time (ms)
|
67370 |
+
iteration 813/ 292968 | consumed samples: 1665024 | consumed tokens: 137527296 | elapsed time per iteration (ms): 78875.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67371 |
+
time (ms)
|
67372 |
+
iteration 814/ 292968 | consumed samples: 1667072 | consumed tokens: 137740288 | elapsed time per iteration (ms): 77038.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67373 |
+
time (ms)
|
67374 |
+
iteration 815/ 292968 | consumed samples: 1669120 | consumed tokens: 137953280 | elapsed time per iteration (ms): 78966.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67375 |
+
time (ms)
|
67376 |
+
iteration 816/ 292968 | consumed samples: 1671168 | consumed tokens: 138166272 | elapsed time per iteration (ms): 78271.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67377 |
+
time (ms)
|
67378 |
+
iteration 817/ 292968 | consumed samples: 1673216 | consumed tokens: 138379264 | elapsed time per iteration (ms): 78760.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67379 |
+
time (ms)
|
67380 |
+
iteration 818/ 292968 | consumed samples: 1675264 | consumed tokens: 138592256 | elapsed time per iteration (ms): 80164.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67381 |
+
time (ms)
|
67382 |
+
iteration 819/ 292968 | consumed samples: 1677312 | consumed tokens: 138805248 | elapsed time per iteration (ms): 78758.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67383 |
+
time (ms)
|
67384 |
+
iteration 820/ 292968 | consumed samples: 1679360 | consumed tokens: 139018240 | elapsed time per iteration (ms): 80404.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67385 |
+
time (ms)
|
67386 |
+
iteration 821/ 292968 | consumed samples: 1681408 | consumed tokens: 139231232 | elapsed time per iteration (ms): 77913.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67387 |
+
time (ms)
|
67388 |
+
iteration 822/ 292968 | consumed samples: 1683456 | consumed tokens: 139444224 | elapsed time per iteration (ms): 77540.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67389 |
+
time (ms)
|
67390 |
+
iteration 823/ 292968 | consumed samples: 1685504 | consumed tokens: 139657216 | elapsed time per iteration (ms): 76602.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67391 |
+
time (ms)
|
67392 |
+
iteration 824/ 292968 | consumed samples: 1687552 | consumed tokens: 139870208 | elapsed time per iteration (ms): 77871.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67393 |
+
time (ms)
|
67394 |
+
iteration 825/ 292968 | consumed samples: 1689600 | consumed tokens: 140083200 | elapsed time per iteration (ms): 81554.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67395 |
+
time (ms)
|
67396 |
+
iteration 826/ 292968 | consumed samples: 1691648 | consumed tokens: 140296192 | elapsed time per iteration (ms): 77593.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67397 |
+
time (ms)
|
67398 |
+
iteration 827/ 292968 | consumed samples: 1693696 | consumed tokens: 140509184 | elapsed time per iteration (ms): 76966.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67399 |
+
time (ms)
|
67400 |
+
iteration 828/ 292968 | consumed samples: 1695744 | consumed tokens: 140722176 | elapsed time per iteration (ms): 78500.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67401 |
+
time (ms)
|
67402 |
+
iteration 829/ 292968 | consumed samples: 1697792 | consumed tokens: 140935168 | elapsed time per iteration (ms): 78281.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67403 |
+
time (ms)
|
67404 |
+
iteration 830/ 292968 | consumed samples: 1699840 | consumed tokens: 141148160 | elapsed time per iteration (ms): 76785.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67405 |
+
time (ms)
|
67406 |
+
iteration 831/ 292968 | consumed samples: 1701888 | consumed tokens: 141361152 | elapsed time per iteration (ms): 78291.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67407 |
+
time (ms)
|
67408 |
+
iteration 832/ 292968 | consumed samples: 1703936 | consumed tokens: 141574144 | elapsed time per iteration (ms): 77150.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67409 |
+
time (ms)
|
67410 |
+
iteration 833/ 292968 | consumed samples: 1705984 | consumed tokens: 141787136 | elapsed time per iteration (ms): 79163.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67411 |
+
time (ms)
|
67412 |
+
iteration 834/ 292968 | consumed samples: 1708032 | consumed tokens: 142000128 | elapsed time per iteration (ms): 80157.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67413 |
+
time (ms)
|
67414 |
+
iteration 835/ 292968 | consumed samples: 1710080 | consumed tokens: 142213120 | elapsed time per iteration (ms): 78440.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67415 |
+
time (ms)
|
67416 |
+
iteration 836/ 292968 | consumed samples: 1712128 | consumed tokens: 142426112 | elapsed time per iteration (ms): 76862.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67417 |
+
time (ms)
|
67418 |
+
iteration 837/ 292968 | consumed samples: 1714176 | consumed tokens: 142639104 | elapsed time per iteration (ms): 78281.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67419 |
+
time (ms)
|
67420 |
+
iteration 838/ 292968 | consumed samples: 1716224 | consumed tokens: 142852096 | elapsed time per iteration (ms): 78619.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67421 |
+
time (ms)
|
67422 |
+
iteration 839/ 292968 | consumed samples: 1718272 | consumed tokens: 143065088 | elapsed time per iteration (ms): 78310.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67423 |
+
time (ms)
|
67424 |
+
iteration 840/ 292968 | consumed samples: 1720320 | consumed tokens: 143278080 | elapsed time per iteration (ms): 78428.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67425 |
+
time (ms)
|
67426 |
+
iteration 841/ 292968 | consumed samples: 1722368 | consumed tokens: 143491072 | elapsed time per iteration (ms): 78459.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67427 |
+
time (ms)
|
67428 |
+
iteration 842/ 292968 | consumed samples: 1724416 | consumed tokens: 143704064 | elapsed time per iteration (ms): 79007.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67429 |
+
time (ms)
|
67430 |
+
iteration 843/ 292968 | consumed samples: 1726464 | consumed tokens: 143917056 | elapsed time per iteration (ms): 78188.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67431 |
+
time (ms)
|
67432 |
+
iteration 844/ 292968 | consumed samples: 1728512 | consumed tokens: 144130048 | elapsed time per iteration (ms): 79792.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67433 |
+
time (ms)
|
67434 |
+
iteration 845/ 292968 | consumed samples: 1730560 | consumed tokens: 144343040 | elapsed time per iteration (ms): 79053.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67435 |
+
time (ms)
|
67436 |
+
iteration 846/ 292968 | consumed samples: 1732608 | consumed tokens: 144556032 | elapsed time per iteration (ms): 77709.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67437 |
+
time (ms)
|
67438 |
+
iteration 847/ 292968 | consumed samples: 1734656 | consumed tokens: 144769024 | elapsed time per iteration (ms): 77030.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67439 |
+
time (ms)
|
67440 |
+
iteration 848/ 292968 | consumed samples: 1736704 | consumed tokens: 144982016 | elapsed time per iteration (ms): 78480.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67441 |
+
time (ms)
|
67442 |
+
iteration 849/ 292968 | consumed samples: 1738752 | consumed tokens: 145195008 | elapsed time per iteration (ms): 79274.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67443 |
+
time (ms)
|
67444 |
+
iteration 850/ 292968 | consumed samples: 1740800 | consumed tokens: 145408000 | elapsed time per iteration (ms): 78104.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67445 |
+
time (ms)
|
67446 |
+
iteration 851/ 292968 | consumed samples: 1742848 | consumed tokens: 145620992 | elapsed time per iteration (ms): 78348.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67447 |
+
time (ms)
|
67448 |
+
iteration 852/ 292968 | consumed samples: 1744896 | consumed tokens: 145833984 | elapsed time per iteration (ms): 78993.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67449 |
+
time (ms)
|
67450 |
+
iteration 853/ 292968 | consumed samples: 1746944 | consumed tokens: 146046976 | elapsed time per iteration (ms): 78849.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67451 |
+
time (ms)
|
67452 |
+
iteration 854/ 292968 | consumed samples: 1748992 | consumed tokens: 146259968 | elapsed time per iteration (ms): 78395.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67453 |
+
time (ms)
|
67454 |
+
iteration 855/ 292968 | consumed samples: 1751040 | consumed tokens: 146472960 | elapsed time per iteration (ms): 77359.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67455 |
+
time (ms)
|
67456 |
+
iteration 856/ 292968 | consumed samples: 1753088 | consumed tokens: 146685952 | elapsed time per iteration (ms): 79532.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67457 |
+
time (ms)
|
67458 |
+
iteration 857/ 292968 | consumed samples: 1755136 | consumed tokens: 146898944 | elapsed time per iteration (ms): 77728.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
67459 |
+
time (ms)
|