bigscience-bot
commited on
Commit
·
8457336
1
Parent(s):
44b1547
new data
Browse files- logs/main_log.txt +68 -0
logs/main_log.txt
CHANGED
@@ -86889,3 +86889,71 @@ time (ms)
|
|
86889 |
time (ms)
|
86890 |
iteration 1256/ 292968 | consumed samples: 2572288 | consumed tokens: 243695616 | elapsed time per iteration (ms): 91479.6 | learning rate: 6.859E-05 | global batch size: 2048 | lm loss: 4.337495E+00 | loss scale: 16384.0 | grad norm: 9382.482 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86891 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86889 |
time (ms)
|
86890 |
iteration 1256/ 292968 | consumed samples: 2572288 | consumed tokens: 243695616 | elapsed time per iteration (ms): 91479.6 | learning rate: 6.859E-05 | global batch size: 2048 | lm loss: 4.337495E+00 | loss scale: 16384.0 | grad norm: 9382.482 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86891 |
time (ms)
|
86892 |
+
iteration 1257/ 292968 | consumed samples: 2574336 | consumed tokens: 243957760 | elapsed time per iteration (ms): 89077.2 | learning rate: 6.865E-05 | global batch size: 2048 | lm loss: 4.360833E+00 | loss scale: 16384.0 | grad norm: 10931.909 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86893 |
+
time (ms)
|
86894 |
+
iteration 1258/ 292968 | consumed samples: 2576384 | consumed tokens: 244219904 | elapsed time per iteration (ms): 89543.6 | learning rate: 6.870E-05 | global batch size: 2048 | lm loss: 4.355038E+00 | loss scale: 16384.0 | grad norm: 12315.148 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86895 |
+
time (ms)
|
86896 |
+
iteration 1259/ 292968 | consumed samples: 2578432 | consumed tokens: 244482048 | elapsed time per iteration (ms): 86626.2 | learning rate: 6.876E-05 | global batch size: 2048 | lm loss: 4.332624E+00 | loss scale: 16384.0 | grad norm: 9028.785 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86897 |
+
time (ms)
|
86898 |
+
iteration 1260/ 292968 | consumed samples: 2580480 | consumed tokens: 244744192 | elapsed time per iteration (ms): 88403.0 | learning rate: 6.881E-05 | global batch size: 2048 | lm loss: 4.353878E+00 | loss scale: 16384.0 | grad norm: 8587.953 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86899 |
+
time (ms)
|
86900 |
+
iteration 1261/ 292968 | consumed samples: 2582528 | consumed tokens: 245006336 | elapsed time per iteration (ms): 90653.6 | learning rate: 6.887E-05 | global batch size: 2048 | lm loss: 4.406543E+00 | loss scale: 16384.0 | grad norm: 8519.735 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86901 |
+
time (ms)
|
86902 |
+
iteration 1262/ 292968 | consumed samples: 2584576 | consumed tokens: 245268480 | elapsed time per iteration (ms): 101721.7 | learning rate: 6.892E-05 | global batch size: 2048 | lm loss: 4.337947E+00 | loss scale: 16384.0 | grad norm: 10856.149 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86903 |
+
time (ms)
|
86904 |
+
iteration 1263/ 292968 | consumed samples: 2586624 | consumed tokens: 245530624 | elapsed time per iteration (ms): 98966.3 | learning rate: 6.898E-05 | global batch size: 2048 | lm loss: 4.345151E+00 | loss scale: 16384.0 | grad norm: 12642.575 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86905 |
+
time (ms)
|
86906 |
+
iteration 1264/ 292968 | consumed samples: 2588672 | consumed tokens: 245792768 | elapsed time per iteration (ms): 104276.2 | learning rate: 6.903E-05 | global batch size: 2048 | lm loss: 4.373935E+00 | loss scale: 16384.0 | grad norm: 13739.412 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86907 |
+
time (ms)
|
86908 |
+
iteration 1265/ 292968 | consumed samples: 2590720 | consumed tokens: 246054912 | elapsed time per iteration (ms): 106458.8 | learning rate: 6.909E-05 | global batch size: 2048 | lm loss: 4.336057E+00 | loss scale: 16384.0 | grad norm: 13718.934 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86909 |
+
time (ms)
|
86910 |
+
iteration 1266/ 292968 | consumed samples: 2592768 | consumed tokens: 246317056 | elapsed time per iteration (ms): 109558.3 | learning rate: 6.914E-05 | global batch size: 2048 | lm loss: 4.348790E+00 | loss scale: 16384.0 | grad norm: 15140.293 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86911 |
+
time (ms)
|
86912 |
+
iteration 1267/ 292968 | consumed samples: 2594816 | consumed tokens: 246579200 | elapsed time per iteration (ms): 101169.1 | learning rate: 6.920E-05 | global batch size: 2048 | lm loss: 4.336976E+00 | loss scale: 16384.0 | grad norm: 18580.935 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86913 |
+
time (ms)
|
86914 |
+
iteration 1268/ 292968 | consumed samples: 2596864 | consumed tokens: 246841344 | elapsed time per iteration (ms): 103186.3 | learning rate: 6.925E-05 | global batch size: 2048 | lm loss: 4.351308E+00 | loss scale: 16384.0 | grad norm: 9034.022 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86915 |
+
time (ms)
|
86916 |
+
iteration 1269/ 292968 | consumed samples: 2598912 | consumed tokens: 247103488 | elapsed time per iteration (ms): 103322.1 | learning rate: 6.930E-05 | global batch size: 2048 | lm loss: 4.338009E+00 | loss scale: 16384.0 | grad norm: 10030.218 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86917 |
+
time (ms)
|
86918 |
+
iteration 1270/ 292968 | consumed samples: 2600960 | consumed tokens: 247365632 | elapsed time per iteration (ms): 104430.5 | learning rate: 6.936E-05 | global batch size: 2048 | lm loss: 4.323060E+00 | loss scale: 16384.0 | grad norm: 10375.946 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86919 |
+
time (ms)
|
86920 |
+
iteration 1271/ 292968 | consumed samples: 2603008 | consumed tokens: 247627776 | elapsed time per iteration (ms): 101797.9 | learning rate: 6.941E-05 | global batch size: 2048 | lm loss: 4.337749E+00 | loss scale: 16384.0 | grad norm: 8465.022 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86921 |
+
time (ms)
|
86922 |
+
iteration 1272/ 292968 | consumed samples: 2605056 | consumed tokens: 247889920 | elapsed time per iteration (ms): 105815.4 | learning rate: 6.947E-05 | global batch size: 2048 | lm loss: 4.322408E+00 | loss scale: 16384.0 | grad norm: 8592.805 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86923 |
+
time (ms)
|
86924 |
+
iteration 1273/ 292968 | consumed samples: 2607104 | consumed tokens: 248152064 | elapsed time per iteration (ms): 108179.9 | learning rate: 6.952E-05 | global batch size: 2048 | lm loss: 4.321740E+00 | loss scale: 16384.0 | grad norm: 10722.339 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86925 |
+
time (ms)
|
86926 |
+
iteration 1274/ 292968 | consumed samples: 2609152 | consumed tokens: 248414208 | elapsed time per iteration (ms): 110063.2 | learning rate: 6.958E-05 | global batch size: 2048 | lm loss: 4.321163E+00 | loss scale: 16384.0 | grad norm: 12199.826 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86927 |
+
time (ms)
|
86928 |
+
iteration 1275/ 292968 | consumed samples: 2611200 | consumed tokens: 248676352 | elapsed time per iteration (ms): 112486.2 | learning rate: 6.963E-05 | global batch size: 2048 | lm loss: 4.359476E+00 | loss scale: 16384.0 | grad norm: 13015.753 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86929 |
+
time (ms)
|
86930 |
+
iteration 1276/ 292968 | consumed samples: 2613248 | consumed tokens: 248938496 | elapsed time per iteration (ms): 119132.6 | learning rate: 6.969E-05 | global batch size: 2048 | lm loss: 4.368865E+00 | loss scale: 16384.0 | grad norm: 12810.900 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86931 |
+
time (ms)
|
86932 |
+
iteration 1277/ 292968 | consumed samples: 2615296 | consumed tokens: 249200640 | elapsed time per iteration (ms): 124483.3 | learning rate: 6.974E-05 | global batch size: 2048 | lm loss: 4.319435E+00 | loss scale: 16384.0 | grad norm: 11086.670 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86933 |
+
time (ms)
|
86934 |
+
iteration 1278/ 292968 | consumed samples: 2617344 | consumed tokens: 249462784 | elapsed time per iteration (ms): 131501.7 | learning rate: 6.980E-05 | global batch size: 2048 | lm loss: 4.343135E+00 | loss scale: 16384.0 | grad norm: 10249.176 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86935 |
+
time (ms)
|
86936 |
+
iteration 1279/ 292968 | consumed samples: 2619392 | consumed tokens: 249724928 | elapsed time per iteration (ms): 122263.3 | learning rate: 6.985E-05 | global batch size: 2048 | lm loss: 4.333991E+00 | loss scale: 16384.0 | grad norm: 8418.978 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86937 |
+
time (ms)
|
86938 |
+
iteration 1280/ 292968 | consumed samples: 2621440 | consumed tokens: 249987072 | elapsed time per iteration (ms): 125027.7 | learning rate: 6.991E-05 | global batch size: 2048 | lm loss: 4.344658E+00 | loss scale: 16384.0 | grad norm: 9345.066 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86939 |
+
time (ms)
|
86940 |
+
iteration 1281/ 292968 | consumed samples: 2623488 | consumed tokens: 250249216 | elapsed time per iteration (ms): 119818.3 | learning rate: 6.996E-05 | global batch size: 2048 | lm loss: 4.340658E+00 | loss scale: 16384.0 | grad norm: 11343.930 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86941 |
+
time (ms)
|
86942 |
+
iteration 1282/ 292968 | consumed samples: 2625536 | consumed tokens: 250511360 | elapsed time per iteration (ms): 107960.9 | learning rate: 7.001E-05 | global batch size: 2048 | lm loss: 4.367644E+00 | loss scale: 16384.0 | grad norm: 11059.651 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86943 |
+
time (ms)
|
86944 |
+
iteration 1283/ 292968 | consumed samples: 2627584 | consumed tokens: 250773504 | elapsed time per iteration (ms): 103476.2 | learning rate: 7.007E-05 | global batch size: 2048 | lm loss: 4.343670E+00 | loss scale: 16384.0 | grad norm: 9443.485 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86945 |
+
time (ms)
|
86946 |
+
iteration 1284/ 292968 | consumed samples: 2629632 | consumed tokens: 251035648 | elapsed time per iteration (ms): 113204.7 | learning rate: 7.012E-05 | global batch size: 2048 | lm loss: 4.341036E+00 | loss scale: 16384.0 | grad norm: 10326.934 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86947 |
+
time (ms)
|
86948 |
+
iteration 1285/ 292968 | consumed samples: 2631680 | consumed tokens: 251297792 | elapsed time per iteration (ms): 101453.0 | learning rate: 7.018E-05 | global batch size: 2048 | lm loss: 4.335133E+00 | loss scale: 16384.0 | grad norm: 13935.373 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86949 |
+
time (ms)
|
86950 |
+
iteration 1286/ 292968 | consumed samples: 2633728 | consumed tokens: 251559936 | elapsed time per iteration (ms): 101126.4 | learning rate: 7.023E-05 | global batch size: 2048 | lm loss: 4.328067E+00 | loss scale: 16384.0 | grad norm: 13261.563 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86951 |
+
time (ms)
|
86952 |
+
iteration 1287/ 292968 | consumed samples: 2635776 | consumed tokens: 251822080 | elapsed time per iteration (ms): 101433.7 | learning rate: 7.029E-05 | global batch size: 2048 | lm loss: 4.332537E+00 | loss scale: 16384.0 | grad norm: 10151.353 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86953 |
+
time (ms)
|
86954 |
+
iteration 1288/ 292968 | consumed samples: 2637824 | consumed tokens: 252084224 | elapsed time per iteration (ms): 97179.0 | learning rate: 7.034E-05 | global batch size: 2048 | lm loss: 4.328178E+00 | loss scale: 16384.0 | grad norm: 12186.076 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86955 |
+
time (ms)
|
86956 |
+
iteration 1289/ 292968 | consumed samples: 2639872 | consumed tokens: 252346368 | elapsed time per iteration (ms): 97410.4 | learning rate: 7.040E-05 | global batch size: 2048 | lm loss: 4.303625E+00 | loss scale: 16384.0 | grad norm: 15999.316 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86957 |
+
time (ms)
|
86958 |
+
iteration 1290/ 292968 | consumed samples: 2641920 | consumed tokens: 252608512 | elapsed time per iteration (ms): 97712.4 | learning rate: 7.045E-05 | global batch size: 2048 | lm loss: 4.325552E+00 | loss scale: 16384.0 | grad norm: 17938.209 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86959 |
+
time (ms)
|