bigscience-bot commited on
Commit
69acce6
·
1 Parent(s): 53ce2e7
Files changed (1) hide show
  1. logs/main_log.txt +92 -0
logs/main_log.txt CHANGED
@@ -67273,3 +67273,95 @@ time (ms)
67273
  time (ms)
67274
  iteration 765/ 292968 | consumed samples: 1566720 | consumed tokens: 127303680 | elapsed time per iteration (ms): 76814.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67275
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67273
  time (ms)
67274
  iteration 765/ 292968 | consumed samples: 1566720 | consumed tokens: 127303680 | elapsed time per iteration (ms): 76814.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67275
  time (ms)
67276
+ iteration 766/ 292968 | consumed samples: 1568768 | consumed tokens: 127516672 | elapsed time per iteration (ms): 77218.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67277
+ time (ms)
67278
+ iteration 767/ 292968 | consumed samples: 1570816 | consumed tokens: 127729664 | elapsed time per iteration (ms): 77724.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67279
+ time (ms)
67280
+ iteration 768/ 292968 | consumed samples: 1572864 | consumed tokens: 127942656 | elapsed time per iteration (ms): 79202.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67281
+ time (ms)
67282
+ iteration 769/ 292968 | consumed samples: 1574912 | consumed tokens: 128155648 | elapsed time per iteration (ms): 78713.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67283
+ time (ms)
67284
+ iteration 770/ 292968 | consumed samples: 1576960 | consumed tokens: 128368640 | elapsed time per iteration (ms): 78768.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67285
+ time (ms)
67286
+ iteration 771/ 292968 | consumed samples: 1579008 | consumed tokens: 128581632 | elapsed time per iteration (ms): 77027.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67287
+ time (ms)
67288
+ iteration 772/ 292968 | consumed samples: 1581056 | consumed tokens: 128794624 | elapsed time per iteration (ms): 77694.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67289
+ time (ms)
67290
+ iteration 773/ 292968 | consumed samples: 1583104 | consumed tokens: 129007616 | elapsed time per iteration (ms): 78285.6 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67291
+ time (ms)
67292
+ iteration 774/ 292968 | consumed samples: 1585152 | consumed tokens: 129220608 | elapsed time per iteration (ms): 77768.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67293
+ time (ms)
67294
+ iteration 775/ 292968 | consumed samples: 1587200 | consumed tokens: 129433600 | elapsed time per iteration (ms): 78751.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67295
+ time (ms)
67296
+ iteration 776/ 292968 | consumed samples: 1589248 | consumed tokens: 129646592 | elapsed time per iteration (ms): 78528.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67297
+ time (ms)
67298
+ iteration 777/ 292968 | consumed samples: 1591296 | consumed tokens: 129859584 | elapsed time per iteration (ms): 78682.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67299
+ time (ms)
67300
+ iteration 778/ 292968 | consumed samples: 1593344 | consumed tokens: 130072576 | elapsed time per iteration (ms): 77272.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67301
+ time (ms)
67302
+ iteration 779/ 292968 | consumed samples: 1595392 | consumed tokens: 130285568 | elapsed time per iteration (ms): 80038.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67303
+ time (ms)
67304
+ iteration 780/ 292968 | consumed samples: 1597440 | consumed tokens: 130498560 | elapsed time per iteration (ms): 77708.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67305
+ time (ms)
67306
+ iteration 781/ 292968 | consumed samples: 1599488 | consumed tokens: 130711552 | elapsed time per iteration (ms): 77785.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67307
+ time (ms)
67308
+ iteration 782/ 292968 | consumed samples: 1601536 | consumed tokens: 130924544 | elapsed time per iteration (ms): 77721.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67309
+ time (ms)
67310
+ iteration 783/ 292968 | consumed samples: 1603584 | consumed tokens: 131137536 | elapsed time per iteration (ms): 78420.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67311
+ time (ms)
67312
+ iteration 784/ 292968 | consumed samples: 1605632 | consumed tokens: 131350528 | elapsed time per iteration (ms): 78087.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67313
+ time (ms)
67314
+ iteration 785/ 292968 | consumed samples: 1607680 | consumed tokens: 131563520 | elapsed time per iteration (ms): 79958.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67315
+ time (ms)
67316
+ iteration 786/ 292968 | consumed samples: 1609728 | consumed tokens: 131776512 | elapsed time per iteration (ms): 78833.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67317
+ time (ms)
67318
+ iteration 787/ 292968 | consumed samples: 1611776 | consumed tokens: 131989504 | elapsed time per iteration (ms): 76965.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67319
+ time (ms)
67320
+ iteration 788/ 292968 | consumed samples: 1613824 | consumed tokens: 132202496 | elapsed time per iteration (ms): 77924.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67321
+ time (ms)
67322
+ iteration 789/ 292968 | consumed samples: 1615872 | consumed tokens: 132415488 | elapsed time per iteration (ms): 78840.8 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67323
+ time (ms)
67324
+ iteration 790/ 292968 | consumed samples: 1617920 | consumed tokens: 132628480 | elapsed time per iteration (ms): 77402.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67325
+ time (ms)
67326
+ iteration 791/ 292968 | consumed samples: 1619968 | consumed tokens: 132841472 | elapsed time per iteration (ms): 78261.1 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67327
+ time (ms)
67328
+ iteration 792/ 292968 | consumed samples: 1622016 | consumed tokens: 133054464 | elapsed time per iteration (ms): 80176.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67329
+ time (ms)
67330
+ iteration 793/ 292968 | consumed samples: 1624064 | consumed tokens: 133267456 | elapsed time per iteration (ms): 79974.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67331
+ time (ms)
67332
+ iteration 794/ 292968 | consumed samples: 1626112 | consumed tokens: 133480448 | elapsed time per iteration (ms): 77972.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67333
+ time (ms)
67334
+ iteration 795/ 292968 | consumed samples: 1628160 | consumed tokens: 133693440 | elapsed time per iteration (ms): 78413.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67335
+ time (ms)
67336
+ iteration 796/ 292968 | consumed samples: 1630208 | consumed tokens: 133906432 | elapsed time per iteration (ms): 79004.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67337
+ time (ms)
67338
+ iteration 797/ 292968 | consumed samples: 1632256 | consumed tokens: 134119424 | elapsed time per iteration (ms): 76848.4 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67339
+ time (ms)
67340
+ iteration 798/ 292968 | consumed samples: 1634304 | consumed tokens: 134332416 | elapsed time per iteration (ms): 78243.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67341
+ time (ms)
67342
+ iteration 799/ 292968 | consumed samples: 1636352 | consumed tokens: 134545408 | elapsed time per iteration (ms): 79156.5 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67343
+ time (ms)
67344
+ iteration 800/ 292968 | consumed samples: 1638400 | consumed tokens: 134758400 | elapsed time per iteration (ms): 77568.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67345
+ time (ms)
67346
+ iteration 801/ 292968 | consumed samples: 1640448 | consumed tokens: 134971392 | elapsed time per iteration (ms): 78323.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67347
+ time (ms)
67348
+ iteration 802/ 292968 | consumed samples: 1642496 | consumed tokens: 135184384 | elapsed time per iteration (ms): 78633.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67349
+ time (ms)
67350
+ iteration 803/ 292968 | consumed samples: 1644544 | consumed tokens: 135397376 | elapsed time per iteration (ms): 78813.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67351
+ time (ms)
67352
+ iteration 804/ 292968 | consumed samples: 1646592 | consumed tokens: 135610368 | elapsed time per iteration (ms): 78171.9 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67353
+ time (ms)
67354
+ iteration 805/ 292968 | consumed samples: 1648640 | consumed tokens: 135823360 | elapsed time per iteration (ms): 77535.3 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67355
+ time (ms)
67356
+ iteration 806/ 292968 | consumed samples: 1650688 | consumed tokens: 136036352 | elapsed time per iteration (ms): 76979.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67357
+ time (ms)
67358
+ iteration 807/ 292968 | consumed samples: 1652736 | consumed tokens: 136249344 | elapsed time per iteration (ms): 79204.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67359
+ time (ms)
67360
+ iteration 808/ 292968 | consumed samples: 1654784 | consumed tokens: 136462336 | elapsed time per iteration (ms): 77025.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67361
+ time (ms)
67362
+ iteration 809/ 292968 | consumed samples: 1656832 | consumed tokens: 136675328 | elapsed time per iteration (ms): 77032.0 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67363
+ time (ms)
67364
+ iteration 810/ 292968 | consumed samples: 1658880 | consumed tokens: 136888320 | elapsed time per iteration (ms): 78530.2 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67365
+ time (ms)
67366
+ iteration 811/ 292968 | consumed samples: 1660928 | consumed tokens: 137101312 | elapsed time per iteration (ms): 78796.7 | learning rate: 6.000E-05 | global batch size: 2048 | loss scale: 1.0 | grad norm: 45230.465 | num zeros: 0.0 | curriculum seqlen: 104 | number of skipped iterations: 0 | number of nan iterations: 0 |
67367
+ time (ms)