bigscience-bot commited on
Commit
1d6b6d1
·
1 Parent(s): bb486b9
Files changed (1) hide show
  1. logs/main_log.txt +494 -0
logs/main_log.txt CHANGED
@@ -39804,3 +39804,497 @@ time (ms)
39804
  time (ms)
39805
  [2021-09-24 18:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
39806
  [2021-09-24 18:07:07] PULSE: tr8-104B is running for 12:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39804
  time (ms)
39805
  [2021-09-24 18:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
39806
  [2021-09-24 18:07:07] PULSE: tr8-104B is running for 12:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
39807
+ iteration 4073/ 159576 | consumed samples: 83088 | elapsed time per iteration (ms): 14430.9 | learning rate: 2.301E-05 | global batch size: 32 | lm loss: 6.464416E+00 | loss scale: 16384.0 | grad norm: 92935.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39808
+ time (ms)
39809
+ iteration 4074/ 159576 | consumed samples: 83120 | elapsed time per iteration (ms): 14595.5 | learning rate: 2.302E-05 | global batch size: 32 | lm loss: 6.394172E+00 | loss scale: 16384.0 | grad norm: 93727.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39810
+ time (ms)
39811
+ iteration 4075/ 159576 | consumed samples: 83152 | elapsed time per iteration (ms): 14478.6 | learning rate: 2.303E-05 | global batch size: 32 | lm loss: 6.535138E+00 | loss scale: 16384.0 | grad norm: 110910.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39812
+ time (ms)
39813
+ iteration 4076/ 159576 | consumed samples: 83184 | elapsed time per iteration (ms): 14559.7 | learning rate: 2.304E-05 | global batch size: 32 | lm loss: 6.459756E+00 | loss scale: 16384.0 | grad norm: 79798.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39814
+ time (ms)
39815
+ iteration 4077/ 159576 | consumed samples: 83216 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.305E-05 | global batch size: 32 | lm loss: 6.388766E+00 | loss scale: 16384.0 | grad norm: 80153.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39816
+ time (ms)
39817
+ iteration 4078/ 159576 | consumed samples: 83248 | elapsed time per iteration (ms): 15028.3 | learning rate: 2.305E-05 | global batch size: 32 | lm loss: 6.462305E+00 | loss scale: 16384.0 | grad norm: 72541.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39818
+ time (ms)
39819
+ iteration 4079/ 159576 | consumed samples: 83280 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.306E-05 | global batch size: 32 | lm loss: 6.606649E+00 | loss scale: 16384.0 | grad norm: 72682.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39820
+ time (ms)
39821
+ iteration 4080/ 159576 | consumed samples: 83312 | elapsed time per iteration (ms): 14478.7 | learning rate: 2.307E-05 | global batch size: 32 | lm loss: 6.339183E+00 | loss scale: 16384.0 | grad norm: 77952.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39822
+ time (ms)
39823
+ iteration 4081/ 159576 | consumed samples: 83344 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.308E-05 | global batch size: 32 | lm loss: 6.482682E+00 | loss scale: 16384.0 | grad norm: 78541.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39824
+ time (ms)
39825
+ iteration 4082/ 159576 | consumed samples: 83376 | elapsed time per iteration (ms): 14971.6 | learning rate: 2.309E-05 | global batch size: 32 | lm loss: 6.464870E+00 | loss scale: 16384.0 | grad norm: 82812.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39826
+ time (ms)
39827
+ iteration 4083/ 159576 | consumed samples: 83408 | elapsed time per iteration (ms): 14619.1 | learning rate: 2.310E-05 | global batch size: 32 | lm loss: 6.468065E+00 | loss scale: 16384.0 | grad norm: 95549.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39828
+ time (ms)
39829
+ iteration 4084/ 159576 | consumed samples: 83440 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.311E-05 | global batch size: 32 | lm loss: 6.390970E+00 | loss scale: 16384.0 | grad norm: 76775.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39830
+ time (ms)
39831
+ iteration 4085/ 159576 | consumed samples: 83472 | elapsed time per iteration (ms): 14597.4 | learning rate: 2.312E-05 | global batch size: 32 | lm loss: 6.441597E+00 | loss scale: 16384.0 | grad norm: 87885.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39832
+ time (ms)
39833
+ iteration 4086/ 159576 | consumed samples: 83504 | elapsed time per iteration (ms): 14827.9 | learning rate: 2.313E-05 | global batch size: 32 | lm loss: 6.332308E+00 | loss scale: 16384.0 | grad norm: 67530.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39834
+ time (ms)
39835
+ iteration 4087/ 159576 | consumed samples: 83536 | elapsed time per iteration (ms): 14496.3 | learning rate: 2.313E-05 | global batch size: 32 | lm loss: 6.360069E+00 | loss scale: 16384.0 | grad norm: 65277.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39836
+ time (ms)
39837
+ iteration 4088/ 159576 | consumed samples: 83568 | elapsed time per iteration (ms): 14505.1 | learning rate: 2.314E-05 | global batch size: 32 | lm loss: 6.331870E+00 | loss scale: 16384.0 | grad norm: 73276.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39838
+ time (ms)
39839
+ iteration 4089/ 159576 | consumed samples: 83600 | elapsed time per iteration (ms): 14518.3 | learning rate: 2.315E-05 | global batch size: 32 | lm loss: 6.279953E+00 | loss scale: 16384.0 | grad norm: 69193.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39840
+ time (ms)
39841
+ iteration 4090/ 159576 | consumed samples: 83632 | elapsed time per iteration (ms): 14816.9 | learning rate: 2.316E-05 | global batch size: 32 | lm loss: 6.473932E+00 | loss scale: 16384.0 | grad norm: 78838.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39842
+ time (ms)
39843
+ iteration 4091/ 159576 | consumed samples: 83664 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.317E-05 | global batch size: 32 | lm loss: 6.346605E+00 | loss scale: 16384.0 | grad norm: 76401.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39844
+ time (ms)
39845
+ iteration 4092/ 159576 | consumed samples: 83696 | elapsed time per iteration (ms): 14611.5 | learning rate: 2.318E-05 | global batch size: 32 | lm loss: 6.444325E+00 | loss scale: 16384.0 | grad norm: 85411.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39846
+ time (ms)
39847
+ iteration 4093/ 159576 | consumed samples: 83728 | elapsed time per iteration (ms): 14540.2 | learning rate: 2.319E-05 | global batch size: 32 | lm loss: 6.498468E+00 | loss scale: 16384.0 | grad norm: 97013.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39848
+ time (ms)
39849
+ iteration 4094/ 159576 | consumed samples: 83760 | elapsed time per iteration (ms): 14934.5 | learning rate: 2.320E-05 | global batch size: 32 | lm loss: 6.368524E+00 | loss scale: 16384.0 | grad norm: 75310.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39850
+ time (ms)
39851
+ iteration 4095/ 159576 | consumed samples: 83792 | elapsed time per iteration (ms): 14479.4 | learning rate: 2.321E-05 | global batch size: 32 | lm loss: 6.445729E+00 | loss scale: 16384.0 | grad norm: 79666.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39852
+ time (ms)
39853
+ iteration 4096/ 159576 | consumed samples: 83824 | elapsed time per iteration (ms): 14539.3 | learning rate: 2.321E-05 | global batch size: 32 | lm loss: 6.478226E+00 | loss scale: 16384.0 | grad norm: 74953.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39854
+ time (ms)
39855
+ iteration 4097/ 159576 | consumed samples: 83856 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.322E-05 | global batch size: 32 | lm loss: 6.494800E+00 | loss scale: 16384.0 | grad norm: 83444.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39856
+ time (ms)
39857
+ iteration 4098/ 159576 | consumed samples: 83888 | elapsed time per iteration (ms): 14987.3 | learning rate: 2.323E-05 | global batch size: 32 | lm loss: 6.549989E+00 | loss scale: 16384.0 | grad norm: 73065.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39858
+ time (ms)
39859
+ iteration 4099/ 159576 | consumed samples: 83920 | elapsed time per iteration (ms): 14510.7 | learning rate: 2.324E-05 | global batch size: 32 | lm loss: 6.523539E+00 | loss scale: 16384.0 | grad norm: 83625.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39860
+ time (ms)
39861
+ iteration 4100/ 159576 | consumed samples: 83952 | elapsed time per iteration (ms): 14610.5 | learning rate: 2.325E-05 | global batch size: 32 | lm loss: 6.451036E+00 | loss scale: 16384.0 | grad norm: 74563.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39862
+ time (ms)
39863
+ iteration 4101/ 159576 | consumed samples: 83984 | elapsed time per iteration (ms): 14604.4 | learning rate: 2.326E-05 | global batch size: 32 | lm loss: 6.472479E+00 | loss scale: 16384.0 | grad norm: 109783.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39864
+ time (ms)
39865
+ iteration 4102/ 159576 | consumed samples: 84016 | elapsed time per iteration (ms): 14804.2 | learning rate: 2.327E-05 | global batch size: 32 | lm loss: 6.392324E+00 | loss scale: 16384.0 | grad norm: 77708.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39866
+ time (ms)
39867
+ iteration 4103/ 159576 | consumed samples: 84048 | elapsed time per iteration (ms): 14666.7 | learning rate: 2.328E-05 | global batch size: 32 | lm loss: 6.388014E+00 | loss scale: 16384.0 | grad norm: 72228.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39868
+ time (ms)
39869
+ iteration 4104/ 159576 | consumed samples: 84080 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.329E-05 | global batch size: 32 | lm loss: 6.351237E+00 | loss scale: 16384.0 | grad norm: 75762.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39870
+ time (ms)
39871
+ iteration 4105/ 159576 | consumed samples: 84112 | elapsed time per iteration (ms): 14512.3 | learning rate: 2.329E-05 | global batch size: 32 | lm loss: 6.445687E+00 | loss scale: 16384.0 | grad norm: 71985.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39872
+ time (ms)
39873
+ iteration 4106/ 159576 | consumed samples: 84144 | elapsed time per iteration (ms): 14555.0 | learning rate: 2.330E-05 | global batch size: 32 | lm loss: 6.450569E+00 | loss scale: 16384.0 | grad norm: 70873.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39874
+ time (ms)
39875
+ iteration 4107/ 159576 | consumed samples: 84176 | elapsed time per iteration (ms): 14836.4 | learning rate: 2.331E-05 | global batch size: 32 | lm loss: 6.490268E+00 | loss scale: 16384.0 | grad norm: 62324.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39876
+ time (ms)
39877
+ iteration 4108/ 159576 | consumed samples: 84208 | elapsed time per iteration (ms): 14607.5 | learning rate: 2.332E-05 | global batch size: 32 | lm loss: 6.503112E+00 | loss scale: 16384.0 | grad norm: 80147.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39878
+ time (ms)
39879
+ iteration 4109/ 159576 | consumed samples: 84240 | elapsed time per iteration (ms): 14516.1 | learning rate: 2.333E-05 | global batch size: 32 | lm loss: 6.575756E+00 | loss scale: 16384.0 | grad norm: 85277.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39880
+ time (ms)
39881
+ iteration 4110/ 159576 | consumed samples: 84272 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.334E-05 | global batch size: 32 | lm loss: 6.521991E+00 | loss scale: 16384.0 | grad norm: 88147.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39882
+ time (ms)
39883
+ iteration 4111/ 159576 | consumed samples: 84304 | elapsed time per iteration (ms): 14643.4 | learning rate: 2.335E-05 | global batch size: 32 | lm loss: 6.583647E+00 | loss scale: 16384.0 | grad norm: 90470.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39884
+ time (ms)
39885
+ iteration 4112/ 159576 | consumed samples: 84336 | elapsed time per iteration (ms): 14501.6 | learning rate: 2.336E-05 | global batch size: 32 | lm loss: 6.307788E+00 | loss scale: 16384.0 | grad norm: 84679.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39886
+ time (ms)
39887
+ iteration 4113/ 159576 | consumed samples: 84368 | elapsed time per iteration (ms): 14565.5 | learning rate: 2.337E-05 | global batch size: 32 | lm loss: 6.392709E+00 | loss scale: 16384.0 | grad norm: 85222.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39888
+ time (ms)
39889
+ iteration 4114/ 159576 | consumed samples: 84400 | elapsed time per iteration (ms): 14580.4 | learning rate: 2.337E-05 | global batch size: 32 | lm loss: 6.384982E+00 | loss scale: 16384.0 | grad norm: 101932.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39890
+ time (ms)
39891
+ iteration 4115/ 159576 | consumed samples: 84432 | elapsed time per iteration (ms): 14793.7 | learning rate: 2.338E-05 | global batch size: 32 | lm loss: 6.402984E+00 | loss scale: 16384.0 | grad norm: 80725.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39892
+ time (ms)
39893
+ iteration 4116/ 159576 | consumed samples: 84464 | elapsed time per iteration (ms): 14599.8 | learning rate: 2.339E-05 | global batch size: 32 | lm loss: 6.431032E+00 | loss scale: 16384.0 | grad norm: 88365.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39894
+ time (ms)
39895
+ iteration 4117/ 159576 | consumed samples: 84496 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.340E-05 | global batch size: 32 | lm loss: 6.544386E+00 | loss scale: 16384.0 | grad norm: 94647.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39896
+ time (ms)
39897
+ iteration 4118/ 159576 | consumed samples: 84528 | elapsed time per iteration (ms): 14520.8 | learning rate: 2.341E-05 | global batch size: 32 | lm loss: 6.494756E+00 | loss scale: 16384.0 | grad norm: 127914.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39898
+ time (ms)
39899
+ iteration 4119/ 159576 | consumed samples: 84560 | elapsed time per iteration (ms): 14810.4 | learning rate: 2.342E-05 | global batch size: 32 | lm loss: 6.676927E+00 | loss scale: 16384.0 | grad norm: 255152.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39900
+ time (ms)
39901
+ iteration 4120/ 159576 | consumed samples: 84592 | elapsed time per iteration (ms): 14553.6 | learning rate: 2.343E-05 | global batch size: 32 | lm loss: 6.521421E+00 | loss scale: 16384.0 | grad norm: 88738.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39902
+ time (ms)
39903
+ iteration 4121/ 159576 | consumed samples: 84624 | elapsed time per iteration (ms): 14615.1 | learning rate: 2.344E-05 | global batch size: 32 | lm loss: 6.422895E+00 | loss scale: 16384.0 | grad norm: 69394.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39904
+ time (ms)
39905
+ iteration 4122/ 159576 | consumed samples: 84656 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.345E-05 | global batch size: 32 | lm loss: 6.391778E+00 | loss scale: 16384.0 | grad norm: 75006.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39906
+ time (ms)
39907
+ iteration 4123/ 159576 | consumed samples: 84688 | elapsed time per iteration (ms): 14981.6 | learning rate: 2.345E-05 | global batch size: 32 | lm loss: 6.569616E+00 | loss scale: 16384.0 | grad norm: 89357.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39908
+ time (ms)
39909
+ iteration 4124/ 159576 | consumed samples: 84720 | elapsed time per iteration (ms): 14751.3 | learning rate: 2.346E-05 | global batch size: 32 | lm loss: 6.522147E+00 | loss scale: 16384.0 | grad norm: 83006.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39910
+ time (ms)
39911
+ iteration 4125/ 159576 | consumed samples: 84752 | elapsed time per iteration (ms): 14464.7 | learning rate: 2.347E-05 | global batch size: 32 | lm loss: 6.443343E+00 | loss scale: 16384.0 | grad norm: 85692.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39912
+ time (ms)
39913
+ iteration 4126/ 159576 | consumed samples: 84784 | elapsed time per iteration (ms): 14544.8 | learning rate: 2.348E-05 | global batch size: 32 | lm loss: 6.447396E+00 | loss scale: 16384.0 | grad norm: 75026.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39914
+ time (ms)
39915
+ iteration 4127/ 159576 | consumed samples: 84816 | elapsed time per iteration (ms): 14837.3 | learning rate: 2.349E-05 | global batch size: 32 | lm loss: 6.407457E+00 | loss scale: 16384.0 | grad norm: 68031.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39916
+ time (ms)
39917
+ iteration 4128/ 159576 | consumed samples: 84848 | elapsed time per iteration (ms): 14497.8 | learning rate: 2.350E-05 | global batch size: 32 | lm loss: 6.509037E+00 | loss scale: 16384.0 | grad norm: 81823.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39918
+ time (ms)
39919
+ iteration 4129/ 159576 | consumed samples: 84880 | elapsed time per iteration (ms): 14560.1 | learning rate: 2.351E-05 | global batch size: 32 | lm loss: 6.349816E+00 | loss scale: 16384.0 | grad norm: 72346.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39920
+ time (ms)
39921
+ iteration 4130/ 159576 | consumed samples: 84912 | elapsed time per iteration (ms): 14548.5 | learning rate: 2.352E-05 | global batch size: 32 | lm loss: 6.479569E+00 | loss scale: 16384.0 | grad norm: 87336.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39922
+ time (ms)
39923
+ iteration 4131/ 159576 | consumed samples: 84944 | elapsed time per iteration (ms): 14910.1 | learning rate: 2.353E-05 | global batch size: 32 | lm loss: 6.617517E+00 | loss scale: 16384.0 | grad norm: 86374.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39924
+ time (ms)
39925
+ iteration 4132/ 159576 | consumed samples: 84976 | elapsed time per iteration (ms): 14494.2 | learning rate: 2.353E-05 | global batch size: 32 | lm loss: 6.465295E+00 | loss scale: 16384.0 | grad norm: 84022.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39926
+ time (ms)
39927
+ iteration 4133/ 159576 | consumed samples: 85008 | elapsed time per iteration (ms): 14507.6 | learning rate: 2.354E-05 | global batch size: 32 | lm loss: 6.496157E+00 | loss scale: 16384.0 | grad norm: 84787.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39928
+ time (ms)
39929
+ iteration 4134/ 159576 | consumed samples: 85040 | elapsed time per iteration (ms): 14524.7 | learning rate: 2.355E-05 | global batch size: 32 | lm loss: 6.413724E+00 | loss scale: 16384.0 | grad norm: 85852.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39930
+ time (ms)
39931
+ iteration 4135/ 159576 | consumed samples: 85072 | elapsed time per iteration (ms): 14838.8 | learning rate: 2.356E-05 | global batch size: 32 | lm loss: 6.625166E+00 | loss scale: 16384.0 | grad norm: 94635.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39932
+ time (ms)
39933
+ iteration 4136/ 159576 | consumed samples: 85104 | elapsed time per iteration (ms): 14542.4 | learning rate: 2.357E-05 | global batch size: 32 | lm loss: 6.407034E+00 | loss scale: 16384.0 | grad norm: 84861.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39934
+ time (ms)
39935
+ iteration 4137/ 159576 | consumed samples: 85136 | elapsed time per iteration (ms): 14613.1 | learning rate: 2.358E-05 | global batch size: 32 | lm loss: 6.522691E+00 | loss scale: 16384.0 | grad norm: 90819.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39936
+ time (ms)
39937
+ iteration 4138/ 159576 | consumed samples: 85168 | elapsed time per iteration (ms): 14588.1 | learning rate: 2.359E-05 | global batch size: 32 | lm loss: 6.515704E+00 | loss scale: 16384.0 | grad norm: 84641.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39938
+ time (ms)
39939
+ iteration 4139/ 159576 | consumed samples: 85200 | elapsed time per iteration (ms): 14775.7 | learning rate: 2.360E-05 | global batch size: 32 | lm loss: 6.462790E+00 | loss scale: 16384.0 | grad norm: 109335.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39940
+ time (ms)
39941
+ iteration 4140/ 159576 | consumed samples: 85232 | elapsed time per iteration (ms): 14632.9 | learning rate: 2.361E-05 | global batch size: 32 | lm loss: 6.565165E+00 | loss scale: 16384.0 | grad norm: 101408.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39942
+ time (ms)
39943
+ iteration 4141/ 159576 | consumed samples: 85264 | elapsed time per iteration (ms): 14488.2 | learning rate: 2.361E-05 | global batch size: 32 | lm loss: 6.378877E+00 | loss scale: 16384.0 | grad norm: 85177.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39944
+ time (ms)
39945
+ iteration 4142/ 159576 | consumed samples: 85296 | elapsed time per iteration (ms): 14538.0 | learning rate: 2.362E-05 | global batch size: 32 | lm loss: 6.464640E+00 | loss scale: 16384.0 | grad norm: 107413.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39946
+ time (ms)
39947
+ iteration 4143/ 159576 | consumed samples: 85328 | elapsed time per iteration (ms): 14656.2 | learning rate: 2.363E-05 | global batch size: 32 | lm loss: 6.672103E+00 | loss scale: 16384.0 | grad norm: 79187.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39948
+ time (ms)
39949
+ iteration 4144/ 159576 | consumed samples: 85360 | elapsed time per iteration (ms): 14916.7 | learning rate: 2.364E-05 | global batch size: 32 | lm loss: 6.691429E+00 | loss scale: 16384.0 | grad norm: 105292.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39950
+ time (ms)
39951
+ iteration 4145/ 159576 | consumed samples: 85392 | elapsed time per iteration (ms): 14496.1 | learning rate: 2.365E-05 | global batch size: 32 | lm loss: 6.428411E+00 | loss scale: 16384.0 | grad norm: 81232.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39952
+ time (ms)
39953
+ iteration 4146/ 159576 | consumed samples: 85424 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.366E-05 | global batch size: 32 | lm loss: 6.483904E+00 | loss scale: 16384.0 | grad norm: 117143.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39954
+ time (ms)
39955
+ iteration 4147/ 159576 | consumed samples: 85456 | elapsed time per iteration (ms): 14531.1 | learning rate: 2.367E-05 | global batch size: 32 | lm loss: 6.363456E+00 | loss scale: 16384.0 | grad norm: 88860.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39956
+ time (ms)
39957
+ iteration 4148/ 159576 | consumed samples: 85488 | elapsed time per iteration (ms): 14766.7 | learning rate: 2.368E-05 | global batch size: 32 | lm loss: 6.523079E+00 | loss scale: 16384.0 | grad norm: 87677.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39958
+ time (ms)
39959
+ iteration 4149/ 159576 | consumed samples: 85520 | elapsed time per iteration (ms): 14507.2 | learning rate: 2.368E-05 | global batch size: 32 | lm loss: 6.553520E+00 | loss scale: 16384.0 | grad norm: 121742.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39960
+ time (ms)
39961
+ iteration 4150/ 159576 | consumed samples: 85552 | elapsed time per iteration (ms): 14548.6 | learning rate: 2.369E-05 | global batch size: 32 | lm loss: 6.490498E+00 | loss scale: 16384.0 | grad norm: 89599.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39962
+ time (ms)
39963
+ iteration 4151/ 159576 | consumed samples: 85584 | elapsed time per iteration (ms): 14535.8 | learning rate: 2.370E-05 | global batch size: 32 | lm loss: 6.498284E+00 | loss scale: 16384.0 | grad norm: 103857.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39964
+ time (ms)
39965
+ iteration 4152/ 159576 | consumed samples: 85616 | elapsed time per iteration (ms): 14637.7 | learning rate: 2.371E-05 | global batch size: 32 | lm loss: 6.607250E+00 | loss scale: 16384.0 | grad norm: 80792.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39966
+ time (ms)
39967
+ iteration 4153/ 159576 | consumed samples: 85648 | elapsed time per iteration (ms): 14584.8 | learning rate: 2.372E-05 | global batch size: 32 | lm loss: 6.465719E+00 | loss scale: 16384.0 | grad norm: 76852.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39968
+ time (ms)
39969
+ iteration 4154/ 159576 | consumed samples: 85680 | elapsed time per iteration (ms): 14575.3 | learning rate: 2.373E-05 | global batch size: 32 | lm loss: 6.475266E+00 | loss scale: 16384.0 | grad norm: 87775.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39970
+ time (ms)
39971
+ iteration 4155/ 159576 | consumed samples: 85712 | elapsed time per iteration (ms): 14452.5 | learning rate: 2.374E-05 | global batch size: 32 | lm loss: 6.456027E+00 | loss scale: 16384.0 | grad norm: 75377.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39972
+ time (ms)
39973
+ iteration 4156/ 159576 | consumed samples: 85744 | elapsed time per iteration (ms): 14769.4 | learning rate: 2.375E-05 | global batch size: 32 | lm loss: 6.436621E+00 | loss scale: 16384.0 | grad norm: 86270.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39974
+ time (ms)
39975
+ iteration 4157/ 159576 | consumed samples: 85776 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.376E-05 | global batch size: 32 | lm loss: 6.502521E+00 | loss scale: 16384.0 | grad norm: 77291.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39976
+ time (ms)
39977
+ iteration 4158/ 159576 | consumed samples: 85808 | elapsed time per iteration (ms): 14605.4 | learning rate: 2.376E-05 | global batch size: 32 | lm loss: 6.271915E+00 | loss scale: 16384.0 | grad norm: 79782.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39978
+ time (ms)
39979
+ iteration 4159/ 159576 | consumed samples: 85840 | elapsed time per iteration (ms): 14468.5 | learning rate: 2.377E-05 | global batch size: 32 | lm loss: 6.375775E+00 | loss scale: 16384.0 | grad norm: 91679.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39980
+ time (ms)
39981
+ iteration 4160/ 159576 | consumed samples: 85872 | elapsed time per iteration (ms): 15055.2 | learning rate: 2.378E-05 | global batch size: 32 | lm loss: 6.207356E+00 | loss scale: 16384.0 | grad norm: 84700.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39982
+ time (ms)
39983
+ iteration 4161/ 159576 | consumed samples: 85904 | elapsed time per iteration (ms): 14639.9 | learning rate: 2.379E-05 | global batch size: 32 | lm loss: 6.385208E+00 | loss scale: 16384.0 | grad norm: 77383.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39984
+ time (ms)
39985
+ iteration 4162/ 159576 | consumed samples: 85936 | elapsed time per iteration (ms): 14461.5 | learning rate: 2.380E-05 | global batch size: 32 | lm loss: 6.480938E+00 | loss scale: 16384.0 | grad norm: 98154.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39986
+ time (ms)
39987
+ iteration 4163/ 159576 | consumed samples: 85968 | elapsed time per iteration (ms): 14557.2 | learning rate: 2.381E-05 | global batch size: 32 | lm loss: 6.427241E+00 | loss scale: 16384.0 | grad norm: 79663.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39988
+ time (ms)
39989
+ iteration 4164/ 159576 | consumed samples: 86000 | elapsed time per iteration (ms): 15046.3 | learning rate: 2.382E-05 | global batch size: 32 | lm loss: 6.310709E+00 | loss scale: 16384.0 | grad norm: 76469.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39990
+ time (ms)
39991
+ iteration 4165/ 159576 | consumed samples: 86032 | elapsed time per iteration (ms): 14517.1 | learning rate: 2.383E-05 | global batch size: 32 | lm loss: 6.597423E+00 | loss scale: 16384.0 | grad norm: 95179.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39992
+ time (ms)
39993
+ iteration 4166/ 159576 | consumed samples: 86064 | elapsed time per iteration (ms): 14562.4 | learning rate: 2.384E-05 | global batch size: 32 | lm loss: 6.398317E+00 | loss scale: 16384.0 | grad norm: 86889.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39994
+ time (ms)
39995
+ iteration 4167/ 159576 | consumed samples: 86096 | elapsed time per iteration (ms): 14577.1 | learning rate: 2.384E-05 | global batch size: 32 | lm loss: 6.447660E+00 | loss scale: 16384.0 | grad norm: 99510.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39996
+ time (ms)
39997
+ iteration 4168/ 159576 | consumed samples: 86128 | elapsed time per iteration (ms): 14813.0 | learning rate: 2.385E-05 | global batch size: 32 | lm loss: 6.528482E+00 | loss scale: 16384.0 | grad norm: 83413.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
39998
+ time (ms)
39999
+ iteration 4169/ 159576 | consumed samples: 86160 | elapsed time per iteration (ms): 14589.9 | learning rate: 2.386E-05 | global batch size: 32 | lm loss: 6.388697E+00 | loss scale: 16384.0 | grad norm: 76722.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40000
+ time (ms)
40001
+ iteration 4170/ 159576 | consumed samples: 86192 | elapsed time per iteration (ms): 14519.5 | learning rate: 2.387E-05 | global batch size: 32 | lm loss: 6.446240E+00 | loss scale: 16384.0 | grad norm: 85947.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40002
+ time (ms)
40003
+ iteration 4171/ 159576 | consumed samples: 86224 | elapsed time per iteration (ms): 14524.6 | learning rate: 2.388E-05 | global batch size: 32 | lm loss: 6.425363E+00 | loss scale: 16384.0 | grad norm: 88474.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40004
+ time (ms)
40005
+ iteration 4172/ 159576 | consumed samples: 86256 | elapsed time per iteration (ms): 14879.2 | learning rate: 2.389E-05 | global batch size: 32 | lm loss: 6.515138E+00 | loss scale: 16384.0 | grad norm: 108134.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40006
+ time (ms)
40007
+ iteration 4173/ 159576 | consumed samples: 86288 | elapsed time per iteration (ms): 14582.3 | learning rate: 2.390E-05 | global batch size: 32 | lm loss: 6.533965E+00 | loss scale: 16384.0 | grad norm: 76749.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40008
+ time (ms)
40009
+ iteration 4174/ 159576 | consumed samples: 86320 | elapsed time per iteration (ms): 14543.3 | learning rate: 2.391E-05 | global batch size: 32 | lm loss: 6.448212E+00 | loss scale: 16384.0 | grad norm: 93972.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40010
+ time (ms)
40011
+ iteration 4175/ 159576 | consumed samples: 86352 | elapsed time per iteration (ms): 14572.0 | learning rate: 2.392E-05 | global batch size: 32 | lm loss: 6.440217E+00 | loss scale: 16384.0 | grad norm: 102291.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40012
+ time (ms)
40013
+ iteration 4176/ 159576 | consumed samples: 86384 | elapsed time per iteration (ms): 14897.3 | learning rate: 2.392E-05 | global batch size: 32 | lm loss: 6.324600E+00 | loss scale: 16384.0 | grad norm: 81057.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40014
+ time (ms)
40015
+ iteration 4177/ 159576 | consumed samples: 86416 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.393E-05 | global batch size: 32 | lm loss: 6.564878E+00 | loss scale: 16384.0 | grad norm: 96270.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40016
+ time (ms)
40017
+ iteration 4178/ 159576 | consumed samples: 86448 | elapsed time per iteration (ms): 14585.7 | learning rate: 2.394E-05 | global batch size: 32 | lm loss: 6.473108E+00 | loss scale: 16384.0 | grad norm: 80498.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40018
+ time (ms)
40019
+ iteration 4179/ 159576 | consumed samples: 86480 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.395E-05 | global batch size: 32 | lm loss: 6.519761E+00 | loss scale: 16384.0 | grad norm: 90509.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40020
+ time (ms)
40021
+ iteration 4180/ 159576 | consumed samples: 86512 | elapsed time per iteration (ms): 14895.7 | learning rate: 2.396E-05 | global batch size: 32 | lm loss: 6.377243E+00 | loss scale: 16384.0 | grad norm: 92370.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40022
+ time (ms)
40023
+ iteration 4181/ 159576 | consumed samples: 86544 | elapsed time per iteration (ms): 14690.0 | learning rate: 2.397E-05 | global batch size: 32 | lm loss: 6.469300E+00 | loss scale: 16384.0 | grad norm: 89492.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40024
+ time (ms)
40025
+ iteration 4182/ 159576 | consumed samples: 86576 | elapsed time per iteration (ms): 14557.6 | learning rate: 2.398E-05 | global batch size: 32 | lm loss: 6.497668E+00 | loss scale: 16384.0 | grad norm: 104899.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40026
+ time (ms)
40027
+ iteration 4183/ 159576 | consumed samples: 86608 | elapsed time per iteration (ms): 14588.2 | learning rate: 2.399E-05 | global batch size: 32 | lm loss: 6.412446E+00 | loss scale: 16384.0 | grad norm: 81267.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40028
+ time (ms)
40029
+ iteration 4184/ 159576 | consumed samples: 86640 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.400E-05 | global batch size: 32 | lm loss: 6.486274E+00 | loss scale: 16384.0 | grad norm: 95404.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40030
+ time (ms)
40031
+ iteration 4185/ 159576 | consumed samples: 86672 | elapsed time per iteration (ms): 14942.6 | learning rate: 2.400E-05 | global batch size: 32 | lm loss: 6.375100E+00 | loss scale: 16384.0 | grad norm: 82372.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40032
+ time (ms)
40033
+ iteration 4186/ 159576 | consumed samples: 86704 | elapsed time per iteration (ms): 14540.4 | learning rate: 2.401E-05 | global batch size: 32 | lm loss: 6.444688E+00 | loss scale: 16384.0 | grad norm: 102268.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40034
+ time (ms)
40035
+ iteration 4187/ 159576 | consumed samples: 86736 | elapsed time per iteration (ms): 14530.9 | learning rate: 2.402E-05 | global batch size: 32 | lm loss: 6.270885E+00 | loss scale: 16384.0 | grad norm: 85114.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40036
+ time (ms)
40037
+ iteration 4188/ 159576 | consumed samples: 86768 | elapsed time per iteration (ms): 14554.4 | learning rate: 2.403E-05 | global batch size: 32 | lm loss: 6.461191E+00 | loss scale: 16384.0 | grad norm: 82795.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40038
+ time (ms)
40039
+ iteration 4189/ 159576 | consumed samples: 86800 | elapsed time per iteration (ms): 14680.7 | learning rate: 2.404E-05 | global batch size: 32 | lm loss: 6.483377E+00 | loss scale: 16384.0 | grad norm: 106142.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40040
+ time (ms)
40041
+ iteration 4190/ 159576 | consumed samples: 86832 | elapsed time per iteration (ms): 14652.1 | learning rate: 2.405E-05 | global batch size: 32 | lm loss: 6.468819E+00 | loss scale: 16384.0 | grad norm: 83557.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40042
+ time (ms)
40043
+ iteration 4191/ 159576 | consumed samples: 86864 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.406E-05 | global batch size: 32 | lm loss: 6.379012E+00 | loss scale: 16384.0 | grad norm: 90619.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40044
+ time (ms)
40045
+ iteration 4192/ 159576 | consumed samples: 86896 | elapsed time per iteration (ms): 14539.1 | learning rate: 2.407E-05 | global batch size: 32 | lm loss: 6.459314E+00 | loss scale: 16384.0 | grad norm: 94282.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40046
+ time (ms)
40047
+ iteration 4193/ 159576 | consumed samples: 86928 | elapsed time per iteration (ms): 14715.7 | learning rate: 2.408E-05 | global batch size: 32 | lm loss: 6.435170E+00 | loss scale: 16384.0 | grad norm: 92946.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40048
+ time (ms)
40049
+ iteration 4194/ 159576 | consumed samples: 86960 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.408E-05 | global batch size: 32 | lm loss: 6.419791E+00 | loss scale: 16384.0 | grad norm: 78251.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40050
+ time (ms)
40051
+ iteration 4195/ 159576 | consumed samples: 86992 | elapsed time per iteration (ms): 14523.0 | learning rate: 2.409E-05 | global batch size: 32 | lm loss: 6.342591E+00 | loss scale: 16384.0 | grad norm: 80571.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40052
+ time (ms)
40053
+ iteration 4196/ 159576 | consumed samples: 87024 | elapsed time per iteration (ms): 14595.3 | learning rate: 2.410E-05 | global batch size: 32 | lm loss: 6.373145E+00 | loss scale: 16384.0 | grad norm: 106409.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40054
+ time (ms)
40055
+ iteration 4197/ 159576 | consumed samples: 87056 | elapsed time per iteration (ms): 14737.5 | learning rate: 2.411E-05 | global batch size: 32 | lm loss: 6.543087E+00 | loss scale: 16384.0 | grad norm: 81359.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40056
+ time (ms)
40057
+ iteration 4198/ 159576 | consumed samples: 87088 | elapsed time per iteration (ms): 14570.3 | learning rate: 2.412E-05 | global batch size: 32 | lm loss: 6.555972E+00 | loss scale: 16384.0 | grad norm: 101442.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40058
+ time (ms)
40059
+ iteration 4199/ 159576 | consumed samples: 87120 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.413E-05 | global batch size: 32 | lm loss: 6.497987E+00 | loss scale: 16384.0 | grad norm: 87789.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40060
+ time (ms)
40061
+ iteration 4200/ 159576 | consumed samples: 87152 | elapsed time per iteration (ms): 14561.0 | learning rate: 2.414E-05 | global batch size: 32 | lm loss: 6.526636E+00 | loss scale: 16384.0 | grad norm: 97375.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40062
+ time (ms)
40063
+ iteration 4201/ 159576 | consumed samples: 87184 | elapsed time per iteration (ms): 14967.8 | learning rate: 2.415E-05 | global batch size: 32 | lm loss: 6.529594E+00 | loss scale: 16384.0 | grad norm: 98056.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40064
+ time (ms)
40065
+ iteration 4202/ 159576 | consumed samples: 87216 | elapsed time per iteration (ms): 14591.5 | learning rate: 2.416E-05 | global batch size: 32 | lm loss: 6.461559E+00 | loss scale: 16384.0 | grad norm: 103248.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40066
+ time (ms)
40067
+ iteration 4203/ 159576 | consumed samples: 87248 | elapsed time per iteration (ms): 14557.3 | learning rate: 2.416E-05 | global batch size: 32 | lm loss: 6.255905E+00 | loss scale: 16384.0 | grad norm: 98489.984 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40068
+ time (ms)
40069
+ iteration 4204/ 159576 | consumed samples: 87280 | elapsed time per iteration (ms): 14539.8 | learning rate: 2.417E-05 | global batch size: 32 | lm loss: 6.456792E+00 | loss scale: 16384.0 | grad norm: 90220.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40070
+ time (ms)
40071
+ iteration 4205/ 159576 | consumed samples: 87312 | elapsed time per iteration (ms): 14936.2 | learning rate: 2.418E-05 | global batch size: 32 | lm loss: 6.456956E+00 | loss scale: 16384.0 | grad norm: 99591.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40072
+ time (ms)
40073
+ iteration 4206/ 159576 | consumed samples: 87344 | elapsed time per iteration (ms): 14602.1 | learning rate: 2.419E-05 | global batch size: 32 | lm loss: 6.539675E+00 | loss scale: 16384.0 | grad norm: 106461.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40074
+ time (ms)
40075
+ iteration 4207/ 159576 | consumed samples: 87376 | elapsed time per iteration (ms): 14518.5 | learning rate: 2.420E-05 | global batch size: 32 | lm loss: 6.581583E+00 | loss scale: 16384.0 | grad norm: 104474.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40076
+ time (ms)
40077
+ iteration 4208/ 159576 | consumed samples: 87408 | elapsed time per iteration (ms): 14546.2 | learning rate: 2.421E-05 | global batch size: 32 | lm loss: 6.470299E+00 | loss scale: 16384.0 | grad norm: 103936.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40078
+ time (ms)
40079
+ iteration 4209/ 159576 | consumed samples: 87440 | elapsed time per iteration (ms): 14895.0 | learning rate: 2.422E-05 | global batch size: 32 | lm loss: 6.485046E+00 | loss scale: 16384.0 | grad norm: 103480.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40080
+ time (ms)
40081
+ iteration 4210/ 159576 | consumed samples: 87472 | elapsed time per iteration (ms): 14490.7 | learning rate: 2.423E-05 | global batch size: 32 | lm loss: 6.331614E+00 | loss scale: 16384.0 | grad norm: 92393.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40082
+ time (ms)
40083
+ iteration 4211/ 159576 | consumed samples: 87504 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.424E-05 | global batch size: 32 | lm loss: 6.343493E+00 | loss scale: 16384.0 | grad norm: 138840.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40084
+ time (ms)
40085
+ iteration 4212/ 159576 | consumed samples: 87536 | elapsed time per iteration (ms): 14559.8 | learning rate: 2.424E-05 | global batch size: 32 | lm loss: 6.362164E+00 | loss scale: 16384.0 | grad norm: 105314.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40086
+ time (ms)
40087
+ iteration 4213/ 159576 | consumed samples: 87568 | elapsed time per iteration (ms): 14962.7 | learning rate: 2.425E-05 | global batch size: 32 | lm loss: 6.413978E+00 | loss scale: 16384.0 | grad norm: 100396.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40088
+ time (ms)
40089
+ iteration 4214/ 159576 | consumed samples: 87600 | elapsed time per iteration (ms): 14459.8 | learning rate: 2.426E-05 | global batch size: 32 | lm loss: 6.333343E+00 | loss scale: 16384.0 | grad norm: 101809.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40090
+ time (ms)
40091
+ iteration 4215/ 159576 | consumed samples: 87632 | elapsed time per iteration (ms): 14541.9 | learning rate: 2.427E-05 | global batch size: 32 | lm loss: 6.552740E+00 | loss scale: 16384.0 | grad norm: 198031.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40092
+ time (ms)
40093
+ iteration 4216/ 159576 | consumed samples: 87664 | elapsed time per iteration (ms): 14546.7 | learning rate: 2.428E-05 | global batch size: 32 | lm loss: 6.373903E+00 | loss scale: 16384.0 | grad norm: 98034.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40094
+ time (ms)
40095
+ iteration 4217/ 159576 | consumed samples: 87696 | elapsed time per iteration (ms): 14848.3 | learning rate: 2.429E-05 | global batch size: 32 | lm loss: 6.452424E+00 | loss scale: 16384.0 | grad norm: 267522.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40096
+ time (ms)
40097
+ iteration 4218/ 159576 | consumed samples: 87728 | elapsed time per iteration (ms): 14570.6 | learning rate: 2.430E-05 | global batch size: 32 | lm loss: 6.493920E+00 | loss scale: 16384.0 | grad norm: 121372.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40098
+ time (ms)
40099
+ iteration 4219/ 159576 | consumed samples: 87760 | elapsed time per iteration (ms): 14553.1 | learning rate: 2.431E-05 | global batch size: 32 | lm loss: 6.478834E+00 | loss scale: 16384.0 | grad norm: 112151.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40100
+ time (ms)
40101
+ iteration 4220/ 159576 | consumed samples: 87792 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.432E-05 | global batch size: 32 | lm loss: 6.452081E+00 | loss scale: 16384.0 | grad norm: 164176.147 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40102
+ time (ms)
40103
+ iteration 4221/ 159576 | consumed samples: 87824 | elapsed time per iteration (ms): 14866.7 | learning rate: 2.432E-05 | global batch size: 32 | lm loss: 6.616721E+00 | loss scale: 16384.0 | grad norm: 88412.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40104
+ time (ms)
40105
+ iteration 4222/ 159576 | consumed samples: 87856 | elapsed time per iteration (ms): 14831.9 | learning rate: 2.433E-05 | global batch size: 32 | lm loss: 6.396004E+00 | loss scale: 16384.0 | grad norm: 116548.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40106
+ time (ms)
40107
+ iteration 4223/ 159576 | consumed samples: 87888 | elapsed time per iteration (ms): 14530.1 | learning rate: 2.434E-05 | global batch size: 32 | lm loss: 6.223457E+00 | loss scale: 16384.0 | grad norm: 151936.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40108
+ time (ms)
40109
+ iteration 4224/ 159576 | consumed samples: 87920 | elapsed time per iteration (ms): 14526.4 | learning rate: 2.435E-05 | global batch size: 32 | lm loss: 6.471479E+00 | loss scale: 16384.0 | grad norm: 107150.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40110
+ time (ms)
40111
+ iteration 4225/ 159576 | consumed samples: 87952 | elapsed time per iteration (ms): 14556.3 | learning rate: 2.436E-05 | global batch size: 32 | lm loss: 6.420123E+00 | loss scale: 16384.0 | grad norm: 118336.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40112
+ time (ms)
40113
+ iteration 4226/ 159576 | consumed samples: 87984 | elapsed time per iteration (ms): 14779.5 | learning rate: 2.437E-05 | global batch size: 32 | lm loss: 6.463729E+00 | loss scale: 16384.0 | grad norm: 105104.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40114
+ time (ms)
40115
+ iteration 4227/ 159576 | consumed samples: 88016 | elapsed time per iteration (ms): 14616.1 | learning rate: 2.438E-05 | global batch size: 32 | lm loss: 6.384348E+00 | loss scale: 16384.0 | grad norm: 121857.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40116
+ time (ms)
40117
+ iteration 4228/ 159576 | consumed samples: 88048 | elapsed time per iteration (ms): 14595.0 | learning rate: 2.439E-05 | global batch size: 32 | lm loss: 6.562186E+00 | loss scale: 16384.0 | grad norm: 120895.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40118
+ time (ms)
40119
+ iteration 4229/ 159576 | consumed samples: 88080 | elapsed time per iteration (ms): 14592.9 | learning rate: 2.439E-05 | global batch size: 32 | lm loss: 6.614166E+00 | loss scale: 16384.0 | grad norm: 141989.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40120
+ time (ms)
40121
+ iteration 4230/ 159576 | consumed samples: 88112 | elapsed time per iteration (ms): 14745.8 | learning rate: 2.440E-05 | global batch size: 32 | lm loss: 6.416856E+00 | loss scale: 16384.0 | grad norm: 135385.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40122
+ time (ms)
40123
+ iteration 4231/ 159576 | consumed samples: 88144 | elapsed time per iteration (ms): 14547.3 | learning rate: 2.441E-05 | global batch size: 32 | lm loss: 6.576384E+00 | loss scale: 16384.0 | grad norm: 129034.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40124
+ time (ms)
40125
+ iteration 4232/ 159576 | consumed samples: 88176 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.442E-05 | global batch size: 32 | lm loss: 6.371499E+00 | loss scale: 16384.0 | grad norm: 102463.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40126
+ time (ms)
40127
+ iteration 4233/ 159576 | consumed samples: 88208 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.443E-05 | global batch size: 32 | lm loss: 6.598085E+00 | loss scale: 16384.0 | grad norm: 105075.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40128
+ time (ms)
40129
+ iteration 4234/ 159576 | consumed samples: 88240 | elapsed time per iteration (ms): 14766.2 | learning rate: 2.444E-05 | global batch size: 32 | lm loss: 6.536204E+00 | loss scale: 16384.0 | grad norm: 109004.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40130
+ time (ms)
40131
+ iteration 4235/ 159576 | consumed samples: 88272 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.445E-05 | global batch size: 32 | lm loss: 6.663161E+00 | loss scale: 16384.0 | grad norm: 197099.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40132
+ time (ms)
40133
+ iteration 4236/ 159576 | consumed samples: 88304 | elapsed time per iteration (ms): 14598.2 | learning rate: 2.446E-05 | global batch size: 32 | lm loss: 6.451008E+00 | loss scale: 16384.0 | grad norm: 125746.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40134
+ time (ms)
40135
+ iteration 4237/ 159576 | consumed samples: 88336 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.447E-05 | global batch size: 32 | lm loss: 6.306778E+00 | loss scale: 16384.0 | grad norm: 145717.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40136
+ time (ms)
40137
+ iteration 4238/ 159576 | consumed samples: 88368 | elapsed time per iteration (ms): 14844.4 | learning rate: 2.447E-05 | global batch size: 32 | lm loss: 6.637146E+00 | loss scale: 16384.0 | grad norm: 161986.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40138
+ time (ms)
40139
+ iteration 4239/ 159576 | consumed samples: 88400 | elapsed time per iteration (ms): 14550.6 | learning rate: 2.448E-05 | global batch size: 32 | lm loss: 6.518569E+00 | loss scale: 16384.0 | grad norm: 114815.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40140
+ time (ms)
40141
+ iteration 4240/ 159576 | consumed samples: 88432 | elapsed time per iteration (ms): 14540.5 | learning rate: 2.449E-05 | global batch size: 32 | lm loss: 6.644086E+00 | loss scale: 16384.0 | grad norm: 127083.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40142
+ time (ms)
40143
+ iteration 4241/ 159576 | consumed samples: 88464 | elapsed time per iteration (ms): 14556.9 | learning rate: 2.450E-05 | global batch size: 32 | lm loss: 6.359149E+00 | loss scale: 16384.0 | grad norm: 119916.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40144
+ time (ms)
40145
+ iteration 4242/ 159576 | consumed samples: 88496 | elapsed time per iteration (ms): 14950.3 | learning rate: 2.451E-05 | global batch size: 32 | lm loss: 6.517668E+00 | loss scale: 16384.0 | grad norm: 116850.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40146
+ time (ms)
40147
+ iteration 4243/ 159576 | consumed samples: 88528 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.452E-05 | global batch size: 32 | lm loss: 6.345152E+00 | loss scale: 16384.0 | grad norm: 106829.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40148
+ time (ms)
40149
+ iteration 4244/ 159576 | consumed samples: 88560 | elapsed time per iteration (ms): 14588.0 | learning rate: 2.453E-05 | global batch size: 32 | lm loss: 6.476923E+00 | loss scale: 16384.0 | grad norm: 121409.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40150
+ time (ms)
40151
+ iteration 4245/ 159576 | consumed samples: 88592 | elapsed time per iteration (ms): 14539.0 | learning rate: 2.454E-05 | global batch size: 32 | lm loss: 6.428369E+00 | loss scale: 16384.0 | grad norm: 99872.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40152
+ time (ms)
40153
+ iteration 4246/ 159576 | consumed samples: 88624 | elapsed time per iteration (ms): 15044.1 | learning rate: 2.455E-05 | global batch size: 32 | lm loss: 6.447415E+00 | loss scale: 16384.0 | grad norm: 102765.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40154
+ time (ms)
40155
+ iteration 4247/ 159576 | consumed samples: 88656 | elapsed time per iteration (ms): 14546.9 | learning rate: 2.455E-05 | global batch size: 32 | lm loss: 6.336578E+00 | loss scale: 16384.0 | grad norm: 90835.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40156
+ time (ms)
40157
+ iteration 4248/ 159576 | consumed samples: 88688 | elapsed time per iteration (ms): 14540.1 | learning rate: 2.456E-05 | global batch size: 32 | lm loss: 6.555513E+00 | loss scale: 16384.0 | grad norm: 104407.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40158
+ time (ms)
40159
+ iteration 4249/ 159576 | consumed samples: 88720 | elapsed time per iteration (ms): 14613.4 | learning rate: 2.457E-05 | global batch size: 32 | lm loss: 6.546042E+00 | loss scale: 16384.0 | grad norm: 115379.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40160
+ time (ms)
40161
+ iteration 4250/ 159576 | consumed samples: 88752 | elapsed time per iteration (ms): 14829.6 | learning rate: 2.458E-05 | global batch size: 32 | lm loss: 6.436588E+00 | loss scale: 16384.0 | grad norm: 107293.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40162
+ time (ms)
40163
+ iteration 4251/ 159576 | consumed samples: 88784 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.459E-05 | global batch size: 32 | lm loss: 6.438442E+00 | loss scale: 16384.0 | grad norm: 105034.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40164
+ time (ms)
40165
+ iteration 4252/ 159576 | consumed samples: 88816 | elapsed time per iteration (ms): 14563.6 | learning rate: 2.460E-05 | global batch size: 32 | lm loss: 6.473608E+00 | loss scale: 16384.0 | grad norm: 84036.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40166
+ time (ms)
40167
+ iteration 4253/ 159576 | consumed samples: 88848 | elapsed time per iteration (ms): 14528.1 | learning rate: 2.461E-05 | global batch size: 32 | lm loss: 6.422614E+00 | loss scale: 16384.0 | grad norm: 95068.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40168
+ time (ms)
40169
+ iteration 4254/ 159576 | consumed samples: 88880 | elapsed time per iteration (ms): 14918.1 | learning rate: 2.462E-05 | global batch size: 32 | lm loss: 6.295578E+00 | loss scale: 16384.0 | grad norm: 114489.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40170
+ time (ms)
40171
+ iteration 4255/ 159576 | consumed samples: 88912 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.463E-05 | global batch size: 32 | lm loss: 6.416272E+00 | loss scale: 16384.0 | grad norm: 91261.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40172
+ time (ms)
40173
+ iteration 4256/ 159576 | consumed samples: 88944 | elapsed time per iteration (ms): 14525.5 | learning rate: 2.463E-05 | global batch size: 32 | lm loss: 6.517479E+00 | loss scale: 32768.0 | grad norm: 94254.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40174
+ time (ms)
40175
+ iteration 4257/ 159576 | consumed samples: 88976 | elapsed time per iteration (ms): 14555.5 | learning rate: 2.464E-05 | global batch size: 32 | lm loss: 6.469455E+00 | loss scale: 32768.0 | grad norm: 174372.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40176
+ time (ms)
40177
+ iteration 4258/ 159576 | consumed samples: 89008 | elapsed time per iteration (ms): 14928.2 | learning rate: 2.465E-05 | global batch size: 32 | lm loss: 6.408867E+00 | loss scale: 32768.0 | grad norm: 205212.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40178
+ time (ms)
40179
+ iteration 4259/ 159576 | consumed samples: 89040 | elapsed time per iteration (ms): 14529.5 | learning rate: 2.466E-05 | global batch size: 32 | lm loss: 6.518348E+00 | loss scale: 32768.0 | grad norm: 175125.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40180
+ time (ms)
40181
+ iteration 4260/ 159576 | consumed samples: 89072 | elapsed time per iteration (ms): 14608.9 | learning rate: 2.467E-05 | global batch size: 32 | lm loss: 6.456366E+00 | loss scale: 32768.0 | grad norm: 180925.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40182
+ time (ms)
40183
+ iteration 4261/ 159576 | consumed samples: 89104 | elapsed time per iteration (ms): 14541.2 | learning rate: 2.468E-05 | global batch size: 32 | lm loss: 6.688640E+00 | loss scale: 32768.0 | grad norm: 205129.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40184
+ time (ms)
40185
+ iteration 4262/ 159576 | consumed samples: 89136 | elapsed time per iteration (ms): 14984.8 | learning rate: 2.469E-05 | global batch size: 32 | lm loss: 6.381848E+00 | loss scale: 32768.0 | grad norm: 194086.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40186
+ time (ms)
40187
+ iteration 4263/ 159576 | consumed samples: 89168 | elapsed time per iteration (ms): 14627.4 | learning rate: 2.470E-05 | global batch size: 32 | lm loss: 6.325251E+00 | loss scale: 32768.0 | grad norm: 200329.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40188
+ time (ms)
40189
+ iteration 4264/ 159576 | consumed samples: 89200 | elapsed time per iteration (ms): 14514.4 | learning rate: 2.471E-05 | global batch size: 32 | lm loss: 6.384187E+00 | loss scale: 32768.0 | grad norm: 206513.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40190
+ time (ms)
40191
+ iteration 4265/ 159576 | consumed samples: 89232 | elapsed time per iteration (ms): 14532.8 | learning rate: 2.471E-05 | global batch size: 32 | lm loss: 6.524798E+00 | loss scale: 32768.0 | grad norm: 207588.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40192
+ time (ms)
40193
+ iteration 4266/ 159576 | consumed samples: 89264 | elapsed time per iteration (ms): 14499.0 | learning rate: 2.472E-05 | global batch size: 32 | lm loss: 6.427965E+00 | loss scale: 32768.0 | grad norm: 270396.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40194
+ time (ms)
40195
+ iteration 4267/ 159576 | consumed samples: 89296 | elapsed time per iteration (ms): 14964.3 | learning rate: 2.473E-05 | global batch size: 32 | lm loss: 6.508441E+00 | loss scale: 32768.0 | grad norm: 256825.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40196
+ time (ms)
40197
+ iteration 4268/ 159576 | consumed samples: 89328 | elapsed time per iteration (ms): 14573.4 | learning rate: 2.474E-05 | global batch size: 32 | lm loss: 6.281446E+00 | loss scale: 32768.0 | grad norm: 175050.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40198
+ time (ms)
40199
+ iteration 4269/ 159576 | consumed samples: 89360 | elapsed time per iteration (ms): 14497.3 | learning rate: 2.475E-05 | global batch size: 32 | lm loss: 6.477619E+00 | loss scale: 32768.0 | grad norm: 194699.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40200
+ time (ms)
40201
+ iteration 4270/ 159576 | consumed samples: 89392 | elapsed time per iteration (ms): 14560.8 | learning rate: 2.476E-05 | global batch size: 32 | lm loss: 6.521669E+00 | loss scale: 32768.0 | grad norm: 204025.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40202
+ time (ms)
40203
+ iteration 4271/ 159576 | consumed samples: 89424 | elapsed time per iteration (ms): 14634.9 | learning rate: 2.477E-05 | global batch size: 32 | lm loss: 6.532991E+00 | loss scale: 32768.0 | grad norm: 218350.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40204
+ time (ms)
40205
+ iteration 4272/ 159576 | consumed samples: 89456 | elapsed time per iteration (ms): 14566.6 | learning rate: 2.478E-05 | global batch size: 32 | lm loss: 6.491451E+00 | loss scale: 32768.0 | grad norm: 196213.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40206
+ time (ms)
40207
+ iteration 4273/ 159576 | consumed samples: 89488 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.479E-05 | global batch size: 32 | lm loss: 6.527338E+00 | loss scale: 32768.0 | grad norm: 254430.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40208
+ time (ms)
40209
+ iteration 4274/ 159576 | consumed samples: 89520 | elapsed time per iteration (ms): 14538.5 | learning rate: 2.479E-05 | global batch size: 32 | lm loss: 6.303001E+00 | loss scale: 32768.0 | grad norm: 189173.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40210
+ time (ms)
40211
+ iteration 4275/ 159576 | consumed samples: 89552 | elapsed time per iteration (ms): 14691.4 | learning rate: 2.480E-05 | global batch size: 32 | lm loss: 6.465518E+00 | loss scale: 32768.0 | grad norm: 266867.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40212
+ time (ms)
40213
+ iteration 4276/ 159576 | consumed samples: 89584 | elapsed time per iteration (ms): 14571.4 | learning rate: 2.481E-05 | global batch size: 32 | lm loss: 6.562708E+00 | loss scale: 32768.0 | grad norm: 213181.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40214
+ time (ms)
40215
+ iteration 4277/ 159576 | consumed samples: 89616 | elapsed time per iteration (ms): 14513.3 | learning rate: 2.482E-05 | global batch size: 32 | lm loss: 6.490031E+00 | loss scale: 32768.0 | grad norm: 200238.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40216
+ time (ms)
40217
+ iteration 4278/ 159576 | consumed samples: 89648 | elapsed time per iteration (ms): 14545.3 | learning rate: 2.483E-05 | global batch size: 32 | lm loss: 6.452188E+00 | loss scale: 32768.0 | grad norm: 209603.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40218
+ time (ms)
40219
+ iteration 4279/ 159576 | consumed samples: 89680 | elapsed time per iteration (ms): 14892.6 | learning rate: 2.484E-05 | global batch size: 32 | lm loss: 6.402837E+00 | loss scale: 32768.0 | grad norm: 213512.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40220
+ time (ms)
40221
+ iteration 4280/ 159576 | consumed samples: 89712 | elapsed time per iteration (ms): 14552.6 | learning rate: 2.485E-05 | global batch size: 32 | lm loss: 6.481530E+00 | loss scale: 32768.0 | grad norm: 218939.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40222
+ time (ms)
40223
+ iteration 4281/ 159576 | consumed samples: 89744 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.486E-05 | global batch size: 32 | lm loss: 6.481557E+00 | loss scale: 32768.0 | grad norm: 211553.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40224
+ time (ms)
40225
+ iteration 4282/ 159576 | consumed samples: 89776 | elapsed time per iteration (ms): 14536.1 | learning rate: 2.487E-05 | global batch size: 32 | lm loss: 6.396571E+00 | loss scale: 32768.0 | grad norm: 200119.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40226
+ time (ms)
40227
+ iteration 4283/ 159576 | consumed samples: 89808 | elapsed time per iteration (ms): 14897.4 | learning rate: 2.487E-05 | global batch size: 32 | lm loss: 6.437448E+00 | loss scale: 32768.0 | grad norm: 211733.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40228
+ time (ms)
40229
+ iteration 4284/ 159576 | consumed samples: 89840 | elapsed time per iteration (ms): 14635.9 | learning rate: 2.488E-05 | global batch size: 32 | lm loss: 6.477830E+00 | loss scale: 32768.0 | grad norm: 273937.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40230
+ time (ms)
40231
+ iteration 4285/ 159576 | consumed samples: 89872 | elapsed time per iteration (ms): 14565.4 | learning rate: 2.489E-05 | global batch size: 32 | lm loss: 6.567824E+00 | loss scale: 32768.0 | grad norm: 210402.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40232
+ time (ms)
40233
+ iteration 4286/ 159576 | consumed samples: 89904 | elapsed time per iteration (ms): 14519.6 | learning rate: 2.490E-05 | global batch size: 32 | lm loss: 6.385768E+00 | loss scale: 32768.0 | grad norm: 203200.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40234
+ time (ms)
40235
+ iteration 4287/ 159576 | consumed samples: 89936 | elapsed time per iteration (ms): 14914.9 | learning rate: 2.491E-05 | global batch size: 32 | lm loss: 6.397992E+00 | loss scale: 32768.0 | grad norm: 182816.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40236
+ time (ms)
40237
+ iteration 4288/ 159576 | consumed samples: 89968 | elapsed time per iteration (ms): 14476.6 | learning rate: 2.492E-05 | global batch size: 32 | lm loss: 6.388610E+00 | loss scale: 32768.0 | grad norm: 199735.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40238
+ time (ms)
40239
+ iteration 4289/ 159576 | consumed samples: 90000 | elapsed time per iteration (ms): 14570.5 | learning rate: 2.493E-05 | global batch size: 32 | lm loss: 6.506209E+00 | loss scale: 32768.0 | grad norm: 206990.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40240
+ time (ms)
40241
+ iteration 4290/ 159576 | consumed samples: 90032 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.494E-05 | global batch size: 32 | lm loss: 6.351604E+00 | loss scale: 32768.0 | grad norm: 204481.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40242
+ time (ms)
40243
+ iteration 4291/ 159576 | consumed samples: 90064 | elapsed time per iteration (ms): 14860.6 | learning rate: 2.495E-05 | global batch size: 32 | lm loss: 6.518882E+00 | loss scale: 32768.0 | grad norm: 236219.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40244
+ time (ms)
40245
+ iteration 4292/ 159576 | consumed samples: 90096 | elapsed time per iteration (ms): 14581.4 | learning rate: 2.495E-05 | global batch size: 32 | lm loss: 6.428777E+00 | loss scale: 32768.0 | grad norm: 187907.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40246
+ time (ms)
40247
+ iteration 4293/ 159576 | consumed samples: 90128 | elapsed time per iteration (ms): 14508.1 | learning rate: 2.496E-05 | global batch size: 32 | lm loss: 6.327142E+00 | loss scale: 32768.0 | grad norm: 204872.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40248
+ time (ms)
40249
+ iteration 4294/ 159576 | consumed samples: 90160 | elapsed time per iteration (ms): 14534.7 | learning rate: 2.497E-05 | global batch size: 32 | lm loss: 6.385339E+00 | loss scale: 32768.0 | grad norm: 233375.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40250
+ time (ms)
40251
+ iteration 4295/ 159576 | consumed samples: 90192 | elapsed time per iteration (ms): 14858.3 | learning rate: 2.498E-05 | global batch size: 32 | lm loss: 6.416627E+00 | loss scale: 32768.0 | grad norm: 222806.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40252
+ time (ms)
40253
+ iteration 4296/ 159576 | consumed samples: 90224 | elapsed time per iteration (ms): 14474.6 | learning rate: 2.499E-05 | global batch size: 32 | lm loss: 6.518059E+00 | loss scale: 32768.0 | grad norm: 226593.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40254
+ time (ms)
40255
+ iteration 4297/ 159576 | consumed samples: 90256 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.500E-05 | global batch size: 32 | lm loss: 6.133147E+00 | loss scale: 32768.0 | grad norm: 267419.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40256
+ time (ms)
40257
+ iteration 4298/ 159576 | consumed samples: 90288 | elapsed time per iteration (ms): 14566.4 | learning rate: 2.501E-05 | global batch size: 32 | lm loss: 6.308548E+00 | loss scale: 32768.0 | grad norm: 204598.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40258
+ time (ms)
40259
+ iteration 4299/ 159576 | consumed samples: 90320 | elapsed time per iteration (ms): 14984.7 | learning rate: 2.502E-05 | global batch size: 32 | lm loss: 6.369866E+00 | loss scale: 32768.0 | grad norm: 221545.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40260
+ time (ms)
40261
+ iteration 4300/ 159576 | consumed samples: 90352 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.503E-05 | global batch size: 32 | lm loss: 6.530766E+00 | loss scale: 32768.0 | grad norm: 267800.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40262
+ time (ms)
40263
+ iteration 4301/ 159576 | consumed samples: 90384 | elapsed time per iteration (ms): 14557.5 | learning rate: 2.503E-05 | global batch size: 32 | lm loss: 6.503004E+00 | loss scale: 32768.0 | grad norm: 228461.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40264
+ time (ms)
40265
+ iteration 4302/ 159576 | consumed samples: 90416 | elapsed time per iteration (ms): 14550.0 | learning rate: 2.504E-05 | global batch size: 32 | lm loss: 6.538440E+00 | loss scale: 32768.0 | grad norm: 190026.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40266
+ time (ms)
40267
+ iteration 4303/ 159576 | consumed samples: 90448 | elapsed time per iteration (ms): 14655.7 | learning rate: 2.505E-05 | global batch size: 32 | lm loss: 6.461242E+00 | loss scale: 32768.0 | grad norm: 211257.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40268
+ time (ms)
40269
+ iteration 4304/ 159576 | consumed samples: 90480 | elapsed time per iteration (ms): 14769.1 | learning rate: 2.506E-05 | global batch size: 32 | lm loss: 6.479248E+00 | loss scale: 32768.0 | grad norm: 198712.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40270
+ time (ms)
40271
+ iteration 4305/ 159576 | consumed samples: 90512 | elapsed time per iteration (ms): 14577.3 | learning rate: 2.507E-05 | global batch size: 32 | lm loss: 6.432651E+00 | loss scale: 32768.0 | grad norm: 206822.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40272
+ time (ms)
40273
+ iteration 4306/ 159576 | consumed samples: 90544 | elapsed time per iteration (ms): 14533.2 | learning rate: 2.508E-05 | global batch size: 32 | lm loss: 6.347961E+00 | loss scale: 32768.0 | grad norm: 195748.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40274
+ time (ms)
40275
+ iteration 4307/ 159576 | consumed samples: 90576 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.509E-05 | global batch size: 32 | lm loss: 6.507642E+00 | loss scale: 32768.0 | grad norm: 218663.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40276
+ time (ms)
40277
+ iteration 4308/ 159576 | consumed samples: 90608 | elapsed time per iteration (ms): 14732.7 | learning rate: 2.510E-05 | global batch size: 32 | lm loss: 6.541059E+00 | loss scale: 32768.0 | grad norm: 228970.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40278
+ time (ms)
40279
+ iteration 4309/ 159576 | consumed samples: 90640 | elapsed time per iteration (ms): 14469.9 | learning rate: 2.511E-05 | global batch size: 32 | lm loss: 6.424891E+00 | loss scale: 32768.0 | grad norm: 196198.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40280
+ time (ms)
40281
+ iteration 4310/ 159576 | consumed samples: 90672 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.511E-05 | global batch size: 32 | lm loss: 6.490376E+00 | loss scale: 32768.0 | grad norm: 215960.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40282
+ time (ms)
40283
+ iteration 4311/ 159576 | consumed samples: 90704 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.512E-05 | global batch size: 32 | lm loss: 6.488754E+00 | loss scale: 32768.0 | grad norm: 195374.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40284
+ time (ms)
40285
+ iteration 4312/ 159576 | consumed samples: 90736 | elapsed time per iteration (ms): 14753.9 | learning rate: 2.513E-05 | global batch size: 32 | lm loss: 6.448671E+00 | loss scale: 32768.0 | grad norm: 227732.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40286
+ time (ms)
40287
+ iteration 4313/ 159576 | consumed samples: 90768 | elapsed time per iteration (ms): 14571.8 | learning rate: 2.514E-05 | global batch size: 32 | lm loss: 6.500753E+00 | loss scale: 32768.0 | grad norm: 266264.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40288
+ time (ms)
40289
+ iteration 4314/ 159576 | consumed samples: 90800 | elapsed time per iteration (ms): 14601.7 | learning rate: 2.515E-05 | global batch size: 32 | lm loss: 6.454448E+00 | loss scale: 32768.0 | grad norm: 224312.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40290
+ time (ms)
40291
+ iteration 4315/ 159576 | consumed samples: 90832 | elapsed time per iteration (ms): 14520.9 | learning rate: 2.516E-05 | global batch size: 32 | lm loss: 6.340928E+00 | loss scale: 32768.0 | grad norm: 252168.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40292
+ time (ms)
40293
+ iteration 4316/ 159576 | consumed samples: 90864 | elapsed time per iteration (ms): 14650.6 | learning rate: 2.517E-05 | global batch size: 32 | lm loss: 6.524774E+00 | loss scale: 32768.0 | grad norm: 233060.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40294
+ time (ms)
40295
+ iteration 4317/ 159576 | consumed samples: 90896 | elapsed time per iteration (ms): 14507.8 | learning rate: 2.518E-05 | global batch size: 32 | lm loss: 6.526123E+00 | loss scale: 32768.0 | grad norm: 228145.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40296
+ time (ms)
40297
+ iteration 4318/ 159576 | consumed samples: 90928 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.518E-05 | global batch size: 32 | lm loss: 6.554380E+00 | loss scale: 32768.0 | grad norm: 215247.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
40298
+ time (ms)
40299
+ [2021-09-24 19:07:09] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
40300
+ [2021-09-24 19:07:09] PULSE: tr8-104B is running for 13:14:58 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])