bigscience-bot commited on
Commit
afc4b02
·
1 Parent(s): 85e53d5
Files changed (1) hide show
  1. logs/main_log.txt +70 -0
logs/main_log.txt CHANGED
@@ -86957,3 +86957,73 @@ time (ms)
86957
  time (ms)
86958
  iteration 1290/ 292968 | consumed samples: 2641920 | consumed tokens: 252608512 | elapsed time per iteration (ms): 97712.4 | learning rate: 7.045E-05 | global batch size: 2048 | lm loss: 4.325552E+00 | loss scale: 16384.0 | grad norm: 17938.209 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86959
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86957
  time (ms)
86958
  iteration 1290/ 292968 | consumed samples: 2641920 | consumed tokens: 252608512 | elapsed time per iteration (ms): 97712.4 | learning rate: 7.045E-05 | global batch size: 2048 | lm loss: 4.325552E+00 | loss scale: 16384.0 | grad norm: 17938.209 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86959
  time (ms)
86960
+ iteration 1291/ 292968 | consumed samples: 2643968 | consumed tokens: 252870656 | elapsed time per iteration (ms): 97348.4 | learning rate: 7.051E-05 | global batch size: 2048 | lm loss: 4.313485E+00 | loss scale: 16384.0 | grad norm: 11220.149 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86961
+ time (ms)
86962
+ iteration 1292/ 292968 | consumed samples: 2646016 | consumed tokens: 253132800 | elapsed time per iteration (ms): 97091.0 | learning rate: 7.056E-05 | global batch size: 2048 | lm loss: 4.339503E+00 | loss scale: 16384.0 | grad norm: 15690.936 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86963
+ time (ms)
86964
+ iteration 1293/ 292968 | consumed samples: 2648064 | consumed tokens: 253394944 | elapsed time per iteration (ms): 96068.1 | learning rate: 7.062E-05 | global batch size: 2048 | lm loss: 4.308480E+00 | loss scale: 16384.0 | grad norm: 15248.013 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86965
+ time (ms)
86966
+ iteration 1294/ 292968 | consumed samples: 2650112 | consumed tokens: 253657088 | elapsed time per iteration (ms): 101209.6 | learning rate: 7.067E-05 | global batch size: 2048 | lm loss: 4.299973E+00 | loss scale: 16384.0 | grad norm: 10467.217 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86967
+ time (ms)
86968
+ iteration 1295/ 292968 | consumed samples: 2652160 | consumed tokens: 253919232 | elapsed time per iteration (ms): 106905.6 | learning rate: 7.072E-05 | global batch size: 2048 | lm loss: 4.325128E+00 | loss scale: 16384.0 | grad norm: 10645.088 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86969
+ time (ms)
86970
+ iteration 1296/ 292968 | consumed samples: 2654208 | consumed tokens: 254181376 | elapsed time per iteration (ms): 104630.7 | learning rate: 7.078E-05 | global batch size: 2048 | lm loss: 4.317550E+00 | loss scale: 16384.0 | grad norm: 10104.458 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86971
+ time (ms)
86972
+ iteration 1297/ 292968 | consumed samples: 2656256 | consumed tokens: 254443520 | elapsed time per iteration (ms): 108402.3 | learning rate: 7.083E-05 | global batch size: 2048 | lm loss: 4.301074E+00 | loss scale: 16384.0 | grad norm: 10153.653 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86973
+ time (ms)
86974
+ iteration 1298/ 292968 | consumed samples: 2658304 | consumed tokens: 254705664 | elapsed time per iteration (ms): 101393.9 | learning rate: 7.089E-05 | global batch size: 2048 | lm loss: 4.313783E+00 | loss scale: 16384.0 | grad norm: 11186.819 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86975
+ time (ms)
86976
+ iteration 1299/ 292968 | consumed samples: 2660352 | consumed tokens: 254967808 | elapsed time per iteration (ms): 97468.1 | learning rate: 7.094E-05 | global batch size: 2048 | lm loss: 4.331973E+00 | loss scale: 16384.0 | grad norm: 10929.262 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86977
+ time (ms)
86978
+ iteration 1300/ 292968 | consumed samples: 2662400 | consumed tokens: 255229952 | elapsed time per iteration (ms): 103670.2 | learning rate: 7.100E-05 | global batch size: 2048 | lm loss: 4.320304E+00 | loss scale: 16384.0 | grad norm: 9919.120 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86979
+ time (ms)
86980
+ iteration 1301/ 292968 | consumed samples: 2664448 | consumed tokens: 255492096 | elapsed time per iteration (ms): 103703.3 | learning rate: 7.105E-05 | global batch size: 2048 | lm loss: 4.336925E+00 | loss scale: 16384.0 | grad norm: 10814.834 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86981
+ time (ms)
86982
+ iteration 1302/ 292968 | consumed samples: 2666496 | consumed tokens: 255754240 | elapsed time per iteration (ms): 96139.5 | learning rate: 7.111E-05 | global batch size: 2048 | lm loss: 4.318452E+00 | loss scale: 16384.0 | grad norm: 11068.371 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86983
+ time (ms)
86984
+ iteration 1303/ 292968 | consumed samples: 2668544 | consumed tokens: 256016384 | elapsed time per iteration (ms): 92160.2 | learning rate: 7.116E-05 | global batch size: 2048 | lm loss: 4.331538E+00 | loss scale: 16384.0 | grad norm: 10972.349 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86985
+ time (ms)
86986
+ iteration 1304/ 292968 | consumed samples: 2670592 | consumed tokens: 256278528 | elapsed time per iteration (ms): 87573.4 | learning rate: 7.122E-05 | global batch size: 2048 | lm loss: 4.307694E+00 | loss scale: 16384.0 | grad norm: 13438.511 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86987
+ time (ms)
86988
+ iteration 1305/ 292968 | consumed samples: 2672640 | consumed tokens: 256540672 | elapsed time per iteration (ms): 86671.4 | learning rate: 7.127E-05 | global batch size: 2048 | lm loss: 4.338923E+00 | loss scale: 16384.0 | grad norm: 19454.195 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86989
+ time (ms)
86990
+ iteration 1306/ 292968 | consumed samples: 2674688 | consumed tokens: 256802816 | elapsed time per iteration (ms): 87566.0 | learning rate: 7.133E-05 | global batch size: 2048 | lm loss: 4.320871E+00 | loss scale: 16384.0 | grad norm: 13488.959 | num zeros: 0.0 | curriculum seqlen: 128 | number of skipped iterations: 0 | number of nan iterations: 0 |
86991
+ time (ms)
86992
+ iteration 1307/ 292968 | consumed samples: 2676736 | consumed tokens: 257081344 | elapsed time per iteration (ms): 102038.5 | learning rate: 7.138E-05 | global batch size: 2048 | lm loss: 4.413541E+00 | loss scale: 16384.0 | grad norm: 18168.800 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
86993
+ time (ms)
86994
+ iteration 1308/ 292968 | consumed samples: 2678784 | consumed tokens: 257359872 | elapsed time per iteration (ms): 109015.4 | learning rate: 7.143E-05 | global batch size: 2048 | lm loss: 4.372187E+00 | loss scale: 16384.0 | grad norm: 10812.401 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
86995
+ time (ms)
86996
+ iteration 1309/ 292968 | consumed samples: 2680832 | consumed tokens: 257638400 | elapsed time per iteration (ms): 106725.5 | learning rate: 7.149E-05 | global batch size: 2048 | lm loss: 4.395649E+00 | loss scale: 16384.0 | grad norm: 13451.504 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
86997
+ time (ms)
86998
+ iteration 1310/ 292968 | consumed samples: 2682880 | consumed tokens: 257916928 | elapsed time per iteration (ms): 109015.2 | learning rate: 7.154E-05 | global batch size: 2048 | lm loss: 4.441962E+00 | loss scale: 16384.0 | grad norm: 19299.987 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
86999
+ time (ms)
87000
+ iteration 1311/ 292968 | consumed samples: 2684928 | consumed tokens: 258195456 | elapsed time per iteration (ms): 104596.5 | learning rate: 7.160E-05 | global batch size: 2048 | lm loss: 4.378983E+00 | loss scale: 16384.0 | grad norm: 11561.969 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87001
+ time (ms)
87002
+ iteration 1312/ 292968 | consumed samples: 2686976 | consumed tokens: 258473984 | elapsed time per iteration (ms): 103802.3 | learning rate: 7.165E-05 | global batch size: 2048 | lm loss: 4.374365E+00 | loss scale: 16384.0 | grad norm: 13670.889 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87003
+ time (ms)
87004
+ iteration 1313/ 292968 | consumed samples: 2689024 | consumed tokens: 258752512 | elapsed time per iteration (ms): 103736.3 | learning rate: 7.171E-05 | global batch size: 2048 | lm loss: 4.348674E+00 | loss scale: 16384.0 | grad norm: 10213.036 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87005
+ time (ms)
87006
+ iteration 1314/ 292968 | consumed samples: 2691072 | consumed tokens: 259031040 | elapsed time per iteration (ms): 103663.9 | learning rate: 7.176E-05 | global batch size: 2048 | lm loss: 4.331293E+00 | loss scale: 16384.0 | grad norm: 13151.653 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87007
+ time (ms)
87008
+ iteration 1315/ 292968 | consumed samples: 2693120 | consumed tokens: 259309568 | elapsed time per iteration (ms): 103760.9 | learning rate: 7.182E-05 | global batch size: 2048 | lm loss: 4.315998E+00 | loss scale: 16384.0 | grad norm: 14473.062 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87009
+ time (ms)
87010
+ iteration 1316/ 292968 | consumed samples: 2695168 | consumed tokens: 259588096 | elapsed time per iteration (ms): 104084.0 | learning rate: 7.187E-05 | global batch size: 2048 | lm loss: 4.349117E+00 | loss scale: 16384.0 | grad norm: 11313.236 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87011
+ time (ms)
87012
+ iteration 1317/ 292968 | consumed samples: 2697216 | consumed tokens: 259866624 | elapsed time per iteration (ms): 105133.0 | learning rate: 7.193E-05 | global batch size: 2048 | lm loss: 4.324214E+00 | loss scale: 16384.0 | grad norm: 15165.408 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87013
+ time (ms)
87014
+ iteration 1318/ 292968 | consumed samples: 2699264 | consumed tokens: 260145152 | elapsed time per iteration (ms): 103961.9 | learning rate: 7.198E-05 | global batch size: 2048 | lm loss: 4.297659E+00 | loss scale: 16384.0 | grad norm: 13970.172 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87015
+ time (ms)
87016
+ iteration 1319/ 292968 | consumed samples: 2701312 | consumed tokens: 260423680 | elapsed time per iteration (ms): 103869.3 | learning rate: 7.203E-05 | global batch size: 2048 | lm loss: 4.315687E+00 | loss scale: 16384.0 | grad norm: 12823.779 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87017
+ time (ms)
87018
+ iteration 1320/ 292968 | consumed samples: 2703360 | consumed tokens: 260702208 | elapsed time per iteration (ms): 105499.5 | learning rate: 7.209E-05 | global batch size: 2048 | lm loss: 4.339356E+00 | loss scale: 16384.0 | grad norm: 12505.072 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87019
+ time (ms)
87020
+ iteration 1321/ 292968 | consumed samples: 2705408 | consumed tokens: 260980736 | elapsed time per iteration (ms): 106715.5 | learning rate: 7.214E-05 | global batch size: 2048 | lm loss: 4.322292E+00 | loss scale: 16384.0 | grad norm: 7680.711 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87021
+ time (ms)
87022
+ iteration 1322/ 292968 | consumed samples: 2707456 | consumed tokens: 261259264 | elapsed time per iteration (ms): 104743.5 | learning rate: 7.220E-05 | global batch size: 2048 | lm loss: 4.303059E+00 | loss scale: 16384.0 | grad norm: 11274.482 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87023
+ time (ms)
87024
+ iteration 1323/ 292968 | consumed samples: 2709504 | consumed tokens: 261537792 | elapsed time per iteration (ms): 108461.6 | learning rate: 7.225E-05 | global batch size: 2048 | lm loss: 4.283995E+00 | loss scale: 16384.0 | grad norm: 11434.034 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87025
+ time (ms)
87026
+ iteration 1324/ 292968 | consumed samples: 2711552 | consumed tokens: 261816320 | elapsed time per iteration (ms): 113653.2 | learning rate: 7.231E-05 | global batch size: 2048 | lm loss: 4.292516E+00 | loss scale: 16384.0 | grad norm: 9910.438 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87027
+ time (ms)
87028
+ iteration 1325/ 292968 | consumed samples: 2713600 | consumed tokens: 262094848 | elapsed time per iteration (ms): 113595.4 | learning rate: 7.236E-05 | global batch size: 2048 | lm loss: 4.305782E+00 | loss scale: 16384.0 | grad norm: 9792.060 | num zeros: 0.0 | curriculum seqlen: 136 | number of skipped iterations: 0 | number of nan iterations: 0 |
87029
+ time (ms)