bigscience-bot commited on
Commit
90a29d1
·
1 Parent(s): a4e5b13
Files changed (1) hide show
  1. logs/main_log.txt +144 -0
logs/main_log.txt CHANGED
@@ -106577,3 +106577,147 @@ time (ms)
106577
  time (ms)
106578
  iteration 2543/ 292968 | consumed samples: 5208064 | consumed tokens: 677871616 | elapsed time per iteration (ms): 133073.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.731538E+00 | loss scale: 131072.0 | grad norm: 53881.397 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106579
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106577
  time (ms)
106578
  iteration 2543/ 292968 | consumed samples: 5208064 | consumed tokens: 677871616 | elapsed time per iteration (ms): 133073.7 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.731538E+00 | loss scale: 131072.0 | grad norm: 53881.397 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106579
  time (ms)
106580
+ iteration 2544/ 292968 | consumed samples: 5210112 | consumed tokens: 678281216 | elapsed time per iteration (ms): 130908.9 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.711176E+00 | loss scale: 131072.0 | grad norm: 46917.614 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106581
+ time (ms)
106582
+ iteration 2545/ 292968 | consumed samples: 5212160 | consumed tokens: 678690816 | elapsed time per iteration (ms): 130423.1 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.679188E+00 | loss scale: 131072.0 | grad norm: 44071.737 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106583
+ time (ms)
106584
+ iteration 2546/ 292968 | consumed samples: 5214208 | consumed tokens: 679100416 | elapsed time per iteration (ms): 138117.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.696059E+00 | loss scale: 131072.0 | grad norm: 52838.337 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106585
+ time (ms)
106586
+ iteration 2547/ 292968 | consumed samples: 5216256 | consumed tokens: 679510016 | elapsed time per iteration (ms): 134088.6 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.703378E+00 | loss scale: 131072.0 | grad norm: 60797.403 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106587
+ time (ms)
106588
+ iteration 2548/ 292968 | consumed samples: 5218304 | consumed tokens: 679919616 | elapsed time per iteration (ms): 134911.2 | learning rate: 1.000E-04 | global batch size: 2048 | lm loss: 3.702210E+00 | loss scale: 131072.0 | grad norm: 50331.478 | num zeros: 0.0 | curriculum seqlen: 200 | number of skipped iterations: 0 | number of nan iterations: 0 |
106589
+ time (ms)
106590
+ saving checkpoint at iteration 2548 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
106591
+ [2021-10-28 17:13:39,733] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/mp_rank_00_model_states.pt
106592
+ [2021-10-28 17:13:39,941] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/mp_rank_01_model_states.pt
106593
+ [2021-10-28 17:13:53,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_20_optim_states.pt
106594
+ [2021-10-28 17:13:53,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_58_optim_states.pt
106595
+ [2021-10-28 17:13:53,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_13_optim_states.pt
106596
+ [2021-10-28 17:13:53,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_16_optim_states.pt
106597
+ [2021-10-28 17:13:53,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_55_optim_states.pt
106598
+ [2021-10-28 17:13:53,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_15_optim_states.pt
106599
+ [2021-10-28 17:13:53,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_25_optim_states.pt
106600
+ [2021-10-28 17:13:53,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_09_optim_states.pt
106601
+ [2021-10-28 17:13:53,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_24_optim_states.pt
106602
+ [2021-10-28 17:13:53,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_38_optim_states.pt
106603
+ [2021-10-28 17:13:53,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_49_optim_states.pt
106604
+ [2021-10-28 17:13:53,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_44_optim_states.pt
106605
+ [2021-10-28 17:13:53,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_53_optim_states.pt
106606
+ [2021-10-28 17:13:53,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_51_optim_states.pt
106607
+ [2021-10-28 17:13:53,753] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_22_optim_states.pt
106608
+ [2021-10-28 17:13:53,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_88_optim_states.pt
106609
+ [2021-10-28 17:13:53,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_62_optim_states.pt
106610
+ [2021-10-28 17:13:53,808] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_10_optim_states.pt
106611
+ [2021-10-28 17:13:53,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_60_optim_states.pt
106612
+ [2021-10-28 17:13:53,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_41_optim_states.pt
106613
+ [2021-10-28 17:13:53,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_46_optim_states.pt
106614
+ [2021-10-28 17:13:53,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_73_optim_states.pt
106615
+ [2021-10-28 17:13:53,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_56_optim_states.pt
106616
+ [2021-10-28 17:13:53,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_40_optim_states.pt
106617
+ [2021-10-28 17:13:53,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_35_optim_states.pt
106618
+ [2021-10-28 17:13:53,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_101_optim_states.pt
106619
+ [2021-10-28 17:13:54,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_91_optim_states.pt
106620
+ [2021-10-28 17:13:54,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_74_optim_states.pt
106621
+ [2021-10-28 17:13:54,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_99_optim_states.pt
106622
+ [2021-10-28 17:13:54,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_34_optim_states.pt
106623
+ [2021-10-28 17:13:54,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_17_optim_states.pt
106624
+ [2021-10-28 17:13:54,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_102_optim_states.pt
106625
+ [2021-10-28 17:13:54,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_43_optim_states.pt
106626
+ [2021-10-28 17:13:54,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_77_optim_states.pt
106627
+ [2021-10-28 17:13:54,589] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_28_optim_states.pt
106628
+ [2021-10-28 17:13:54,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_04_optim_states.pt
106629
+ [2021-10-28 17:13:54,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_70_optim_states.pt
106630
+ [2021-10-28 17:13:54,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_14_optim_states.pt
106631
+ [2021-10-28 17:13:54,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_107_optim_states.pt
106632
+ [2021-10-28 17:13:54,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_26_optim_states.pt
106633
+ [2021-10-28 17:13:54,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_37_optim_states.pt
106634
+ [2021-10-28 17:13:54,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_50_optim_states.pt
106635
+ [2021-10-28 17:13:54,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_57_optim_states.pt
106636
+ [2021-10-28 17:13:54,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_27_optim_states.pt
106637
+ [2021-10-28 17:13:54,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_23_optim_states.pt
106638
+ [2021-10-28 17:13:54,681] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_29_optim_states.pt
106639
+ [2021-10-28 17:13:54,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_84_optim_states.pt
106640
+ [2021-10-28 17:13:54,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_114_optim_states.pt
106641
+ [2021-10-28 17:13:54,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_52_optim_states.pt
106642
+ [2021-10-28 17:13:54,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_19_optim_states.pt
106643
+ [2021-10-28 17:13:54,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_61_optim_states.pt
106644
+ [2021-10-28 17:13:54,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_47_optim_states.pt
106645
+ [2021-10-28 17:13:54,780] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_59_optim_states.pt
106646
+ [2021-10-28 17:13:54,790] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_97_optim_states.pt
106647
+ [2021-10-28 17:13:54,796] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_63_optim_states.pt
106648
+ [2021-10-28 17:13:54,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_109_optim_states.pt
106649
+ [2021-10-28 17:13:54,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_11_optim_states.pt
106650
+ [2021-10-28 17:13:54,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_06_optim_states.pt
106651
+ [2021-10-28 17:13:54,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_12_optim_states.pt
106652
+ [2021-10-28 17:13:54,874] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_48_optim_states.pt
106653
+ [2021-10-28 17:13:54,895] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_21_optim_states.pt
106654
+ [2021-10-28 17:13:54,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_54_optim_states.pt
106655
+ [2021-10-28 17:13:54,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_18_optim_states.pt
106656
+ [2021-10-28 17:13:54,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_122_optim_states.pt
106657
+ [2021-10-28 17:13:54,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_111_optim_states.pt
106658
+ [2021-10-28 17:13:54,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_79_optim_states.pt
106659
+ [2021-10-28 17:13:54,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_33_optim_states.pt
106660
+ [2021-10-28 17:13:54,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_80_optim_states.pt
106661
+ [2021-10-28 17:13:54,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_42_optim_states.pt
106662
+ [2021-10-28 17:13:54,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_87_optim_states.pt
106663
+ [2021-10-28 17:13:55,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_68_optim_states.pt
106664
+ [2021-10-28 17:13:55,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_08_optim_states.pt
106665
+ [2021-10-28 17:13:55,015] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_32_optim_states.pt
106666
+ [2021-10-28 17:13:55,030] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_103_optim_states.pt
106667
+ [2021-10-28 17:13:55,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_100_optim_states.pt
106668
+ [2021-10-28 17:13:55,047] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_123_optim_states.pt
106669
+ [2021-10-28 17:13:55,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_106_optim_states.pt
106670
+ [2021-10-28 17:13:55,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_117_optim_states.pt
106671
+ [2021-10-28 17:13:55,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_65_optim_states.pt
106672
+ [2021-10-28 17:13:55,098] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_116_optim_states.pt
106673
+ [2021-10-28 17:13:55,101] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_113_optim_states.pt
106674
+ [2021-10-28 17:13:55,127] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_64_optim_states.pt
106675
+ [2021-10-28 17:13:55,144] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_45_optim_states.pt
106676
+ [2021-10-28 17:13:55,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_96_optim_states.pt
106677
+ [2021-10-28 17:13:55,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_98_optim_states.pt
106678
+ [2021-10-28 17:13:55,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_31_optim_states.pt
106679
+ [2021-10-28 17:13:55,252] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_30_optim_states.pt
106680
+ [2021-10-28 17:13:55,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_105_optim_states.pt
106681
+ [2021-10-28 17:13:55,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_104_optim_states.pt
106682
+ [2021-10-28 17:13:55,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_81_optim_states.pt
106683
+ [2021-10-28 17:13:55,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_83_optim_states.pt
106684
+ [2021-10-28 17:13:55,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_82_optim_states.pt
106685
+ [2021-10-28 17:13:55,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_112_optim_states.pt
106686
+ [2021-10-28 17:13:55,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_118_optim_states.pt
106687
+ [2021-10-28 17:13:55,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_69_optim_states.pt
106688
+ [2021-10-28 17:13:55,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_115_optim_states.pt
106689
+ [2021-10-28 17:13:55,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_71_optim_states.pt
106690
+ [2021-10-28 17:13:55,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_78_optim_states.pt
106691
+ [2021-10-28 17:13:55,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_121_optim_states.pt
106692
+ [2021-10-28 17:13:55,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_119_optim_states.pt
106693
+ [2021-10-28 17:13:55,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_120_optim_states.pt
106694
+ [2021-10-28 17:13:55,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_66_optim_states.pt
106695
+ [2021-10-28 17:13:55,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_76_optim_states.pt
106696
+ [2021-10-28 17:13:55,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_67_optim_states.pt
106697
+ [2021-10-28 17:13:56,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_00_optim_states.pt
106698
+ [2021-10-28 17:13:56,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_124_optim_states.pt
106699
+ [2021-10-28 17:13:56,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_03_optim_states.pt
106700
+ [2021-10-28 17:13:56,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_127_optim_states.pt
106701
+ [2021-10-28 17:13:57,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_07_optim_states.pt
106702
+ [2021-10-28 17:13:57,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_02_optim_states.pt
106703
+ [2021-10-28 17:13:57,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_126_optim_states.pt
106704
+ [2021-10-28 17:13:57,967] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_05_optim_states.pt
106705
+ [2021-10-28 17:13:58,010] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_125_optim_states.pt
106706
+ [2021-10-28 17:13:58,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_01_optim_states.pt
106707
+ [2021-10-28 17:14:00,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_36_optim_states.pt
106708
+ [2021-10-28 17:14:00,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_90_optim_states.pt
106709
+ [2021-10-28 17:14:00,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_110_optim_states.pt
106710
+ [2021-10-28 17:14:00,743] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_39_optim_states.pt
106711
+ [2021-10-28 17:14:00,748] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_89_optim_states.pt
106712
+ [2021-10-28 17:14:01,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_108_optim_states.pt
106713
+ [2021-10-28 17:14:01,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_93_optim_states.pt
106714
+ [2021-10-28 17:14:02,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_95_optim_states.pt
106715
+ [2021-10-28 17:14:03,558] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_94_optim_states.pt
106716
+ [2021-10-28 17:14:04,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_86_optim_states.pt
106717
+ [2021-10-28 17:14:04,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_75_optim_states.pt
106718
+ [2021-10-28 17:14:05,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_85_optim_states.pt
106719
+ [2021-10-28 17:14:05,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_92_optim_states.pt
106720
+ [2021-10-28 17:14:06,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step2548/zero_pp_rank_0_mp_rank_72_optim_states.pt
106721
+ successfully saved checkpoint at iteration 2548 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints
106722
+ time (ms) | save-checkpoint: 29252.10
106723
+ [exiting program after 1190.94853798151 minutes] datetime: 2021-10-28 17:14:06