bigscience-bot
commited on
Commit
·
afc4b02
1
Parent(s):
85e53d5
new data
Browse files- logs/main_log.txt +70 -0
logs/main_log.txt
CHANGED
@@ -86957,3 +86957,73 @@ time (ms)
|
|
86957 |
time (ms)
|
86958 |
iteration 1290/ 292968 | consumed samples: 2641920 | consumed tokens: 252608512 | elapsed time per iteration (ms): 97712.4 | learning rate: 7.045E-05 | global batch size: 2048 | lm loss: 4.325552E+00 | loss scale: 16384.0 | grad norm: 17938.209 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86959 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86957 |
time (ms)
|
86958 |
iteration 1290/ 292968 | consumed samples: 2641920 | consumed tokens: 252608512 | elapsed time per iteration (ms): 97712.4 | learning rate: 7.045E-05 | global batch size: 2048 | lm loss: 4.325552E+00 | loss scale: 16384.0 | grad norm: 17938.209 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86959 |
time (ms)
|
86960 |
+
iteration 1291/ 292968 | consumed samples: 2643968 | consumed tokens: 252870656 | elapsed time per iteration (ms): 97348.4 | learning rate: 7.051E-05 | global batch size: 2048 | lm loss: 4.313485E+00 | loss scale: 16384.0 | grad norm: 11220.149 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86961 |
+
time (ms)
|
86962 |
+
iteration 1292/ 292968 | consumed samples: 2646016 | consumed tokens: 253132800 | elapsed time per iteration (ms): 97091.0 | learning rate: 7.056E-05 | global batch size: 2048 | lm loss: 4.339503E+00 | loss scale: 16384.0 | grad norm: 15690.936 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86963 |
+
time (ms)
|
86964 |
+
iteration 1293/ 292968 | consumed samples: 2648064 | consumed tokens: 253394944 | elapsed time per iteration (ms): 96068.1 | learning rate: 7.062E-05 | global batch size: 2048 | lm loss: 4.308480E+00 | loss scale: 16384.0 | grad norm: 15248.013 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86965 |
+
time (ms)
|
86966 |
+
iteration 1294/ 292968 | consumed samples: 2650112 | consumed tokens: 253657088 | elapsed time per iteration (ms): 101209.6 | learning rate: 7.067E-05 | global batch size: 2048 | lm loss: 4.299973E+00 | loss scale: 16384.0 | grad norm: 10467.217 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86967 |
+
time (ms)
|
86968 |
+
iteration 1295/ 292968 | consumed samples: 2652160 | consumed tokens: 253919232 | elapsed time per iteration (ms): 106905.6 | learning rate: 7.072E-05 | global batch size: 2048 | lm loss: 4.325128E+00 | loss scale: 16384.0 | grad norm: 10645.088 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86969 |
+
time (ms)
|
86970 |
+
iteration 1296/ 292968 | consumed samples: 2654208 | consumed tokens: 254181376 | elapsed time per iteration (ms): 104630.7 | learning rate: 7.078E-05 | global batch size: 2048 | lm loss: 4.317550E+00 | loss scale: 16384.0 | grad norm: 10104.458 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86971 |
+
time (ms)
|
86972 |
+
iteration 1297/ 292968 | consumed samples: 2656256 | consumed tokens: 254443520 | elapsed time per iteration (ms): 108402.3 | learning rate: 7.083E-05 | global batch size: 2048 | lm loss: 4.301074E+00 | loss scale: 16384.0 | grad norm: 10153.653 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86973 |
+
time (ms)
|
86974 |
+
iteration 1298/ 292968 | consumed samples: 2658304 | consumed tokens: 254705664 | elapsed time per iteration (ms): 101393.9 | learning rate: 7.089E-05 | global batch size: 2048 | lm loss: 4.313783E+00 | loss scale: 16384.0 | grad norm: 11186.819 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86975 |
+
time (ms)
|
86976 |
+
iteration 1299/ 292968 | consumed samples: 2660352 | consumed tokens: 254967808 | elapsed time per iteration (ms): 97468.1 | learning rate: 7.094E-05 | global batch size: 2048 | lm loss: 4.331973E+00 | loss scale: 16384.0 | grad norm: 10929.262 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86977 |
+
time (ms)
|
86978 |
+
iteration 1300/ 292968 | consumed samples: 2662400 | consumed tokens: 255229952 | elapsed time per iteration (ms): 103670.2 | learning rate: 7.100E-05 | global batch size: 2048 | lm loss: 4.320304E+00 | loss scale: 16384.0 | grad norm: 9919.120 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86979 |
+
time (ms)
|
86980 |
+
iteration 1301/ 292968 | consumed samples: 2664448 | consumed tokens: 255492096 | elapsed time per iteration (ms): 103703.3 | learning rate: 7.105E-05 | global batch size: 2048 | lm loss: 4.336925E+00 | loss scale: 16384.0 | grad norm: 10814.834 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86981 |
+
time (ms)
|
86982 |
+
iteration 1302/ 292968 | consumed samples: 2666496 | consumed tokens: 255754240 | elapsed time per iteration (ms): 96139.5 | learning rate: 7.111E-05 | global batch size: 2048 | lm loss: 4.318452E+00 | loss scale: 16384.0 | grad norm: 11068.371 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86983 |
+
time (ms)
|
86984 |
+
iteration 1303/ 292968 | consumed samples: 2668544 | consumed tokens: 256016384 | elapsed time per iteration (ms): 92160.2 | learning rate: 7.116E-05 | global batch size: 2048 | lm loss: 4.331538E+00 | loss scale: 16384.0 | grad norm: 10972.349 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86985 |
+
time (ms)
|
86986 |
+
iteration 1304/ 292968 | consumed samples: 2670592 | consumed tokens: 256278528 | elapsed time per iteration (ms): 87573.4 | learning rate: 7.122E-05 | global batch size: 2048 | lm loss: 4.307694E+00 | loss scale: 16384.0 | grad norm: 13438.511 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86987 |
+
time (ms)
|
86988 |
+
iteration 1305/ 292968 | consumed samples: 2672640 | consumed tokens: 256540672 | elapsed time per iteration (ms): 86671.4 | learning rate: 7.127E-05 | global batch size: 2048 | lm loss: 4.338923E+00 | loss scale: 16384.0 | grad norm: 19454.195 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86989 |
+
time (ms)
|
86990 |
+
iteration 1306/ 292968 | consumed samples: 2674688 | consumed tokens: 256802816 | elapsed time per iteration (ms): 87566.0 | learning rate: 7.133E-05 | global batch size: 2048 | lm loss: 4.320871E+00 | loss scale: 16384.0 | grad norm: 13488.959 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86991 |
+
time (ms)
|
86992 |
+
iteration 1307/ 292968 | consumed samples: 2676736 | consumed tokens: 257081344 | elapsed time per iteration (ms): 102038.5 | learning rate: 7.138E-05 | global batch size: 2048 | lm loss: 4.413541E+00 | loss scale: 16384.0 | grad norm: 18168.800 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86993 |
+
time (ms)
|
86994 |
+
iteration 1308/ 292968 | consumed samples: 2678784 | consumed tokens: 257359872 | elapsed time per iteration (ms): 109015.4 | learning rate: 7.143E-05 | global batch size: 2048 | lm loss: 4.372187E+00 | loss scale: 16384.0 | grad norm: 10812.401 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86995 |
+
time (ms)
|
86996 |
+
iteration 1309/ 292968 | consumed samples: 2680832 | consumed tokens: 257638400 | elapsed time per iteration (ms): 106725.5 | learning rate: 7.149E-05 | global batch size: 2048 | lm loss: 4.395649E+00 | loss scale: 16384.0 | grad norm: 13451.504 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86997 |
+
time (ms)
|
86998 |
+
iteration 1310/ 292968 | consumed samples: 2682880 | consumed tokens: 257916928 | elapsed time per iteration (ms): 109015.2 | learning rate: 7.154E-05 | global batch size: 2048 | lm loss: 4.441962E+00 | loss scale: 16384.0 | grad norm: 19299.987 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
86999 |
+
time (ms)
|
87000 |
+
iteration 1311/ 292968 | consumed samples: 2684928 | consumed tokens: 258195456 | elapsed time per iteration (ms): 104596.5 | learning rate: 7.160E-05 | global batch size: 2048 | lm loss: 4.378983E+00 | loss scale: 16384.0 | grad norm: 11561.969 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87001 |
+
time (ms)
|
87002 |
+
iteration 1312/ 292968 | consumed samples: 2686976 | consumed tokens: 258473984 | elapsed time per iteration (ms): 103802.3 | learning rate: 7.165E-05 | global batch size: 2048 | lm loss: 4.374365E+00 | loss scale: 16384.0 | grad norm: 13670.889 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87003 |
+
time (ms)
|
87004 |
+
iteration 1313/ 292968 | consumed samples: 2689024 | consumed tokens: 258752512 | elapsed time per iteration (ms): 103736.3 | learning rate: 7.171E-05 | global batch size: 2048 | lm loss: 4.348674E+00 | loss scale: 16384.0 | grad norm: 10213.036 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87005 |
+
time (ms)
|
87006 |
+
iteration 1314/ 292968 | consumed samples: 2691072 | consumed tokens: 259031040 | elapsed time per iteration (ms): 103663.9 | learning rate: 7.176E-05 | global batch size: 2048 | lm loss: 4.331293E+00 | loss scale: 16384.0 | grad norm: 13151.653 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87007 |
+
time (ms)
|
87008 |
+
iteration 1315/ 292968 | consumed samples: 2693120 | consumed tokens: 259309568 | elapsed time per iteration (ms): 103760.9 | learning rate: 7.182E-05 | global batch size: 2048 | lm loss: 4.315998E+00 | loss scale: 16384.0 | grad norm: 14473.062 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87009 |
+
time (ms)
|
87010 |
+
iteration 1316/ 292968 | consumed samples: 2695168 | consumed tokens: 259588096 | elapsed time per iteration (ms): 104084.0 | learning rate: 7.187E-05 | global batch size: 2048 | lm loss: 4.349117E+00 | loss scale: 16384.0 | grad norm: 11313.236 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87011 |
+
time (ms)
|
87012 |
+
iteration 1317/ 292968 | consumed samples: 2697216 | consumed tokens: 259866624 | elapsed time per iteration (ms): 105133.0 | learning rate: 7.193E-05 | global batch size: 2048 | lm loss: 4.324214E+00 | loss scale: 16384.0 | grad norm: 15165.408 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87013 |
+
time (ms)
|
87014 |
+
iteration 1318/ 292968 | consumed samples: 2699264 | consumed tokens: 260145152 | elapsed time per iteration (ms): 103961.9 | learning rate: 7.198E-05 | global batch size: 2048 | lm loss: 4.297659E+00 | loss scale: 16384.0 | grad norm: 13970.172 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87015 |
+
time (ms)
|
87016 |
+
iteration 1319/ 292968 | consumed samples: 2701312 | consumed tokens: 260423680 | elapsed time per iteration (ms): 103869.3 | learning rate: 7.203E-05 | global batch size: 2048 | lm loss: 4.315687E+00 | loss scale: 16384.0 | grad norm: 12823.779 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87017 |
+
time (ms)
|
87018 |
+
iteration 1320/ 292968 | consumed samples: 2703360 | consumed tokens: 260702208 | elapsed time per iteration (ms): 105499.5 | learning rate: 7.209E-05 | global batch size: 2048 | lm loss: 4.339356E+00 | loss scale: 16384.0 | grad norm: 12505.072 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87019 |
+
time (ms)
|
87020 |
+
iteration 1321/ 292968 | consumed samples: 2705408 | consumed tokens: 260980736 | elapsed time per iteration (ms): 106715.5 | learning rate: 7.214E-05 | global batch size: 2048 | lm loss: 4.322292E+00 | loss scale: 16384.0 | grad norm: 7680.711 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87021 |
+
time (ms)
|
87022 |
+
iteration 1322/ 292968 | consumed samples: 2707456 | consumed tokens: 261259264 | elapsed time per iteration (ms): 104743.5 | learning rate: 7.220E-05 | global batch size: 2048 | lm loss: 4.303059E+00 | loss scale: 16384.0 | grad norm: 11274.482 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87023 |
+
time (ms)
|
87024 |
+
iteration 1323/ 292968 | consumed samples: 2709504 | consumed tokens: 261537792 | elapsed time per iteration (ms): 108461.6 | learning rate: 7.225E-05 | global batch size: 2048 | lm loss: 4.283995E+00 | loss scale: 16384.0 | grad norm: 11434.034 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87025 |
+
time (ms)
|
87026 |
+
iteration 1324/ 292968 | consumed samples: 2711552 | consumed tokens: 261816320 | elapsed time per iteration (ms): 113653.2 | learning rate: 7.231E-05 | global batch size: 2048 | lm loss: 4.292516E+00 | loss scale: 16384.0 | grad norm: 9910.438 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87027 |
+
time (ms)
|
87028 |
+
iteration 1325/ 292968 | consumed samples: 2713600 | consumed tokens: 262094848 | elapsed time per iteration (ms): 113595.4 | learning rate: 7.236E-05 | global batch size: 2048 | lm loss: 4.305782E+00 | loss scale: 16384.0 | grad norm: 9792.060 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
87029 |
+
time (ms)
|