bigscience-bot
commited on
Commit
·
1d6b6d1
1
Parent(s):
bb486b9
new data
Browse files- logs/main_log.txt +494 -0
logs/main_log.txt
CHANGED
@@ -39804,3 +39804,497 @@ time (ms)
|
|
39804 |
time (ms)
|
39805 |
[2021-09-24 18:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
|
39806 |
[2021-09-24 18:07:07] PULSE: tr8-104B is running for 12:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39804 |
time (ms)
|
39805 |
[2021-09-24 18:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
|
39806 |
[2021-09-24 18:07:07] PULSE: tr8-104B is running for 12:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|
39807 |
+
iteration 4073/ 159576 | consumed samples: 83088 | elapsed time per iteration (ms): 14430.9 | learning rate: 2.301E-05 | global batch size: 32 | lm loss: 6.464416E+00 | loss scale: 16384.0 | grad norm: 92935.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39808 |
+
time (ms)
|
39809 |
+
iteration 4074/ 159576 | consumed samples: 83120 | elapsed time per iteration (ms): 14595.5 | learning rate: 2.302E-05 | global batch size: 32 | lm loss: 6.394172E+00 | loss scale: 16384.0 | grad norm: 93727.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39810 |
+
time (ms)
|
39811 |
+
iteration 4075/ 159576 | consumed samples: 83152 | elapsed time per iteration (ms): 14478.6 | learning rate: 2.303E-05 | global batch size: 32 | lm loss: 6.535138E+00 | loss scale: 16384.0 | grad norm: 110910.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39812 |
+
time (ms)
|
39813 |
+
iteration 4076/ 159576 | consumed samples: 83184 | elapsed time per iteration (ms): 14559.7 | learning rate: 2.304E-05 | global batch size: 32 | lm loss: 6.459756E+00 | loss scale: 16384.0 | grad norm: 79798.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39814 |
+
time (ms)
|
39815 |
+
iteration 4077/ 159576 | consumed samples: 83216 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.305E-05 | global batch size: 32 | lm loss: 6.388766E+00 | loss scale: 16384.0 | grad norm: 80153.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39816 |
+
time (ms)
|
39817 |
+
iteration 4078/ 159576 | consumed samples: 83248 | elapsed time per iteration (ms): 15028.3 | learning rate: 2.305E-05 | global batch size: 32 | lm loss: 6.462305E+00 | loss scale: 16384.0 | grad norm: 72541.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39818 |
+
time (ms)
|
39819 |
+
iteration 4079/ 159576 | consumed samples: 83280 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.306E-05 | global batch size: 32 | lm loss: 6.606649E+00 | loss scale: 16384.0 | grad norm: 72682.132 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39820 |
+
time (ms)
|
39821 |
+
iteration 4080/ 159576 | consumed samples: 83312 | elapsed time per iteration (ms): 14478.7 | learning rate: 2.307E-05 | global batch size: 32 | lm loss: 6.339183E+00 | loss scale: 16384.0 | grad norm: 77952.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39822 |
+
time (ms)
|
39823 |
+
iteration 4081/ 159576 | consumed samples: 83344 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.308E-05 | global batch size: 32 | lm loss: 6.482682E+00 | loss scale: 16384.0 | grad norm: 78541.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39824 |
+
time (ms)
|
39825 |
+
iteration 4082/ 159576 | consumed samples: 83376 | elapsed time per iteration (ms): 14971.6 | learning rate: 2.309E-05 | global batch size: 32 | lm loss: 6.464870E+00 | loss scale: 16384.0 | grad norm: 82812.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39826 |
+
time (ms)
|
39827 |
+
iteration 4083/ 159576 | consumed samples: 83408 | elapsed time per iteration (ms): 14619.1 | learning rate: 2.310E-05 | global batch size: 32 | lm loss: 6.468065E+00 | loss scale: 16384.0 | grad norm: 95549.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39828 |
+
time (ms)
|
39829 |
+
iteration 4084/ 159576 | consumed samples: 83440 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.311E-05 | global batch size: 32 | lm loss: 6.390970E+00 | loss scale: 16384.0 | grad norm: 76775.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39830 |
+
time (ms)
|
39831 |
+
iteration 4085/ 159576 | consumed samples: 83472 | elapsed time per iteration (ms): 14597.4 | learning rate: 2.312E-05 | global batch size: 32 | lm loss: 6.441597E+00 | loss scale: 16384.0 | grad norm: 87885.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39832 |
+
time (ms)
|
39833 |
+
iteration 4086/ 159576 | consumed samples: 83504 | elapsed time per iteration (ms): 14827.9 | learning rate: 2.313E-05 | global batch size: 32 | lm loss: 6.332308E+00 | loss scale: 16384.0 | grad norm: 67530.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39834 |
+
time (ms)
|
39835 |
+
iteration 4087/ 159576 | consumed samples: 83536 | elapsed time per iteration (ms): 14496.3 | learning rate: 2.313E-05 | global batch size: 32 | lm loss: 6.360069E+00 | loss scale: 16384.0 | grad norm: 65277.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39836 |
+
time (ms)
|
39837 |
+
iteration 4088/ 159576 | consumed samples: 83568 | elapsed time per iteration (ms): 14505.1 | learning rate: 2.314E-05 | global batch size: 32 | lm loss: 6.331870E+00 | loss scale: 16384.0 | grad norm: 73276.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39838 |
+
time (ms)
|
39839 |
+
iteration 4089/ 159576 | consumed samples: 83600 | elapsed time per iteration (ms): 14518.3 | learning rate: 2.315E-05 | global batch size: 32 | lm loss: 6.279953E+00 | loss scale: 16384.0 | grad norm: 69193.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39840 |
+
time (ms)
|
39841 |
+
iteration 4090/ 159576 | consumed samples: 83632 | elapsed time per iteration (ms): 14816.9 | learning rate: 2.316E-05 | global batch size: 32 | lm loss: 6.473932E+00 | loss scale: 16384.0 | grad norm: 78838.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39842 |
+
time (ms)
|
39843 |
+
iteration 4091/ 159576 | consumed samples: 83664 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.317E-05 | global batch size: 32 | lm loss: 6.346605E+00 | loss scale: 16384.0 | grad norm: 76401.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39844 |
+
time (ms)
|
39845 |
+
iteration 4092/ 159576 | consumed samples: 83696 | elapsed time per iteration (ms): 14611.5 | learning rate: 2.318E-05 | global batch size: 32 | lm loss: 6.444325E+00 | loss scale: 16384.0 | grad norm: 85411.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39846 |
+
time (ms)
|
39847 |
+
iteration 4093/ 159576 | consumed samples: 83728 | elapsed time per iteration (ms): 14540.2 | learning rate: 2.319E-05 | global batch size: 32 | lm loss: 6.498468E+00 | loss scale: 16384.0 | grad norm: 97013.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39848 |
+
time (ms)
|
39849 |
+
iteration 4094/ 159576 | consumed samples: 83760 | elapsed time per iteration (ms): 14934.5 | learning rate: 2.320E-05 | global batch size: 32 | lm loss: 6.368524E+00 | loss scale: 16384.0 | grad norm: 75310.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39850 |
+
time (ms)
|
39851 |
+
iteration 4095/ 159576 | consumed samples: 83792 | elapsed time per iteration (ms): 14479.4 | learning rate: 2.321E-05 | global batch size: 32 | lm loss: 6.445729E+00 | loss scale: 16384.0 | grad norm: 79666.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39852 |
+
time (ms)
|
39853 |
+
iteration 4096/ 159576 | consumed samples: 83824 | elapsed time per iteration (ms): 14539.3 | learning rate: 2.321E-05 | global batch size: 32 | lm loss: 6.478226E+00 | loss scale: 16384.0 | grad norm: 74953.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39854 |
+
time (ms)
|
39855 |
+
iteration 4097/ 159576 | consumed samples: 83856 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.322E-05 | global batch size: 32 | lm loss: 6.494800E+00 | loss scale: 16384.0 | grad norm: 83444.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39856 |
+
time (ms)
|
39857 |
+
iteration 4098/ 159576 | consumed samples: 83888 | elapsed time per iteration (ms): 14987.3 | learning rate: 2.323E-05 | global batch size: 32 | lm loss: 6.549989E+00 | loss scale: 16384.0 | grad norm: 73065.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39858 |
+
time (ms)
|
39859 |
+
iteration 4099/ 159576 | consumed samples: 83920 | elapsed time per iteration (ms): 14510.7 | learning rate: 2.324E-05 | global batch size: 32 | lm loss: 6.523539E+00 | loss scale: 16384.0 | grad norm: 83625.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39860 |
+
time (ms)
|
39861 |
+
iteration 4100/ 159576 | consumed samples: 83952 | elapsed time per iteration (ms): 14610.5 | learning rate: 2.325E-05 | global batch size: 32 | lm loss: 6.451036E+00 | loss scale: 16384.0 | grad norm: 74563.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39862 |
+
time (ms)
|
39863 |
+
iteration 4101/ 159576 | consumed samples: 83984 | elapsed time per iteration (ms): 14604.4 | learning rate: 2.326E-05 | global batch size: 32 | lm loss: 6.472479E+00 | loss scale: 16384.0 | grad norm: 109783.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39864 |
+
time (ms)
|
39865 |
+
iteration 4102/ 159576 | consumed samples: 84016 | elapsed time per iteration (ms): 14804.2 | learning rate: 2.327E-05 | global batch size: 32 | lm loss: 6.392324E+00 | loss scale: 16384.0 | grad norm: 77708.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39866 |
+
time (ms)
|
39867 |
+
iteration 4103/ 159576 | consumed samples: 84048 | elapsed time per iteration (ms): 14666.7 | learning rate: 2.328E-05 | global batch size: 32 | lm loss: 6.388014E+00 | loss scale: 16384.0 | grad norm: 72228.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39868 |
+
time (ms)
|
39869 |
+
iteration 4104/ 159576 | consumed samples: 84080 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.329E-05 | global batch size: 32 | lm loss: 6.351237E+00 | loss scale: 16384.0 | grad norm: 75762.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39870 |
+
time (ms)
|
39871 |
+
iteration 4105/ 159576 | consumed samples: 84112 | elapsed time per iteration (ms): 14512.3 | learning rate: 2.329E-05 | global batch size: 32 | lm loss: 6.445687E+00 | loss scale: 16384.0 | grad norm: 71985.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39872 |
+
time (ms)
|
39873 |
+
iteration 4106/ 159576 | consumed samples: 84144 | elapsed time per iteration (ms): 14555.0 | learning rate: 2.330E-05 | global batch size: 32 | lm loss: 6.450569E+00 | loss scale: 16384.0 | grad norm: 70873.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39874 |
+
time (ms)
|
39875 |
+
iteration 4107/ 159576 | consumed samples: 84176 | elapsed time per iteration (ms): 14836.4 | learning rate: 2.331E-05 | global batch size: 32 | lm loss: 6.490268E+00 | loss scale: 16384.0 | grad norm: 62324.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39876 |
+
time (ms)
|
39877 |
+
iteration 4108/ 159576 | consumed samples: 84208 | elapsed time per iteration (ms): 14607.5 | learning rate: 2.332E-05 | global batch size: 32 | lm loss: 6.503112E+00 | loss scale: 16384.0 | grad norm: 80147.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39878 |
+
time (ms)
|
39879 |
+
iteration 4109/ 159576 | consumed samples: 84240 | elapsed time per iteration (ms): 14516.1 | learning rate: 2.333E-05 | global batch size: 32 | lm loss: 6.575756E+00 | loss scale: 16384.0 | grad norm: 85277.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39880 |
+
time (ms)
|
39881 |
+
iteration 4110/ 159576 | consumed samples: 84272 | elapsed time per iteration (ms): 14534.3 | learning rate: 2.334E-05 | global batch size: 32 | lm loss: 6.521991E+00 | loss scale: 16384.0 | grad norm: 88147.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39882 |
+
time (ms)
|
39883 |
+
iteration 4111/ 159576 | consumed samples: 84304 | elapsed time per iteration (ms): 14643.4 | learning rate: 2.335E-05 | global batch size: 32 | lm loss: 6.583647E+00 | loss scale: 16384.0 | grad norm: 90470.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39884 |
+
time (ms)
|
39885 |
+
iteration 4112/ 159576 | consumed samples: 84336 | elapsed time per iteration (ms): 14501.6 | learning rate: 2.336E-05 | global batch size: 32 | lm loss: 6.307788E+00 | loss scale: 16384.0 | grad norm: 84679.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39886 |
+
time (ms)
|
39887 |
+
iteration 4113/ 159576 | consumed samples: 84368 | elapsed time per iteration (ms): 14565.5 | learning rate: 2.337E-05 | global batch size: 32 | lm loss: 6.392709E+00 | loss scale: 16384.0 | grad norm: 85222.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39888 |
+
time (ms)
|
39889 |
+
iteration 4114/ 159576 | consumed samples: 84400 | elapsed time per iteration (ms): 14580.4 | learning rate: 2.337E-05 | global batch size: 32 | lm loss: 6.384982E+00 | loss scale: 16384.0 | grad norm: 101932.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39890 |
+
time (ms)
|
39891 |
+
iteration 4115/ 159576 | consumed samples: 84432 | elapsed time per iteration (ms): 14793.7 | learning rate: 2.338E-05 | global batch size: 32 | lm loss: 6.402984E+00 | loss scale: 16384.0 | grad norm: 80725.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39892 |
+
time (ms)
|
39893 |
+
iteration 4116/ 159576 | consumed samples: 84464 | elapsed time per iteration (ms): 14599.8 | learning rate: 2.339E-05 | global batch size: 32 | lm loss: 6.431032E+00 | loss scale: 16384.0 | grad norm: 88365.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39894 |
+
time (ms)
|
39895 |
+
iteration 4117/ 159576 | consumed samples: 84496 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.340E-05 | global batch size: 32 | lm loss: 6.544386E+00 | loss scale: 16384.0 | grad norm: 94647.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39896 |
+
time (ms)
|
39897 |
+
iteration 4118/ 159576 | consumed samples: 84528 | elapsed time per iteration (ms): 14520.8 | learning rate: 2.341E-05 | global batch size: 32 | lm loss: 6.494756E+00 | loss scale: 16384.0 | grad norm: 127914.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39898 |
+
time (ms)
|
39899 |
+
iteration 4119/ 159576 | consumed samples: 84560 | elapsed time per iteration (ms): 14810.4 | learning rate: 2.342E-05 | global batch size: 32 | lm loss: 6.676927E+00 | loss scale: 16384.0 | grad norm: 255152.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39900 |
+
time (ms)
|
39901 |
+
iteration 4120/ 159576 | consumed samples: 84592 | elapsed time per iteration (ms): 14553.6 | learning rate: 2.343E-05 | global batch size: 32 | lm loss: 6.521421E+00 | loss scale: 16384.0 | grad norm: 88738.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39902 |
+
time (ms)
|
39903 |
+
iteration 4121/ 159576 | consumed samples: 84624 | elapsed time per iteration (ms): 14615.1 | learning rate: 2.344E-05 | global batch size: 32 | lm loss: 6.422895E+00 | loss scale: 16384.0 | grad norm: 69394.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39904 |
+
time (ms)
|
39905 |
+
iteration 4122/ 159576 | consumed samples: 84656 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.345E-05 | global batch size: 32 | lm loss: 6.391778E+00 | loss scale: 16384.0 | grad norm: 75006.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39906 |
+
time (ms)
|
39907 |
+
iteration 4123/ 159576 | consumed samples: 84688 | elapsed time per iteration (ms): 14981.6 | learning rate: 2.345E-05 | global batch size: 32 | lm loss: 6.569616E+00 | loss scale: 16384.0 | grad norm: 89357.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39908 |
+
time (ms)
|
39909 |
+
iteration 4124/ 159576 | consumed samples: 84720 | elapsed time per iteration (ms): 14751.3 | learning rate: 2.346E-05 | global batch size: 32 | lm loss: 6.522147E+00 | loss scale: 16384.0 | grad norm: 83006.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39910 |
+
time (ms)
|
39911 |
+
iteration 4125/ 159576 | consumed samples: 84752 | elapsed time per iteration (ms): 14464.7 | learning rate: 2.347E-05 | global batch size: 32 | lm loss: 6.443343E+00 | loss scale: 16384.0 | grad norm: 85692.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39912 |
+
time (ms)
|
39913 |
+
iteration 4126/ 159576 | consumed samples: 84784 | elapsed time per iteration (ms): 14544.8 | learning rate: 2.348E-05 | global batch size: 32 | lm loss: 6.447396E+00 | loss scale: 16384.0 | grad norm: 75026.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39914 |
+
time (ms)
|
39915 |
+
iteration 4127/ 159576 | consumed samples: 84816 | elapsed time per iteration (ms): 14837.3 | learning rate: 2.349E-05 | global batch size: 32 | lm loss: 6.407457E+00 | loss scale: 16384.0 | grad norm: 68031.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39916 |
+
time (ms)
|
39917 |
+
iteration 4128/ 159576 | consumed samples: 84848 | elapsed time per iteration (ms): 14497.8 | learning rate: 2.350E-05 | global batch size: 32 | lm loss: 6.509037E+00 | loss scale: 16384.0 | grad norm: 81823.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39918 |
+
time (ms)
|
39919 |
+
iteration 4129/ 159576 | consumed samples: 84880 | elapsed time per iteration (ms): 14560.1 | learning rate: 2.351E-05 | global batch size: 32 | lm loss: 6.349816E+00 | loss scale: 16384.0 | grad norm: 72346.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39920 |
+
time (ms)
|
39921 |
+
iteration 4130/ 159576 | consumed samples: 84912 | elapsed time per iteration (ms): 14548.5 | learning rate: 2.352E-05 | global batch size: 32 | lm loss: 6.479569E+00 | loss scale: 16384.0 | grad norm: 87336.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39922 |
+
time (ms)
|
39923 |
+
iteration 4131/ 159576 | consumed samples: 84944 | elapsed time per iteration (ms): 14910.1 | learning rate: 2.353E-05 | global batch size: 32 | lm loss: 6.617517E+00 | loss scale: 16384.0 | grad norm: 86374.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39924 |
+
time (ms)
|
39925 |
+
iteration 4132/ 159576 | consumed samples: 84976 | elapsed time per iteration (ms): 14494.2 | learning rate: 2.353E-05 | global batch size: 32 | lm loss: 6.465295E+00 | loss scale: 16384.0 | grad norm: 84022.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39926 |
+
time (ms)
|
39927 |
+
iteration 4133/ 159576 | consumed samples: 85008 | elapsed time per iteration (ms): 14507.6 | learning rate: 2.354E-05 | global batch size: 32 | lm loss: 6.496157E+00 | loss scale: 16384.0 | grad norm: 84787.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39928 |
+
time (ms)
|
39929 |
+
iteration 4134/ 159576 | consumed samples: 85040 | elapsed time per iteration (ms): 14524.7 | learning rate: 2.355E-05 | global batch size: 32 | lm loss: 6.413724E+00 | loss scale: 16384.0 | grad norm: 85852.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39930 |
+
time (ms)
|
39931 |
+
iteration 4135/ 159576 | consumed samples: 85072 | elapsed time per iteration (ms): 14838.8 | learning rate: 2.356E-05 | global batch size: 32 | lm loss: 6.625166E+00 | loss scale: 16384.0 | grad norm: 94635.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39932 |
+
time (ms)
|
39933 |
+
iteration 4136/ 159576 | consumed samples: 85104 | elapsed time per iteration (ms): 14542.4 | learning rate: 2.357E-05 | global batch size: 32 | lm loss: 6.407034E+00 | loss scale: 16384.0 | grad norm: 84861.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39934 |
+
time (ms)
|
39935 |
+
iteration 4137/ 159576 | consumed samples: 85136 | elapsed time per iteration (ms): 14613.1 | learning rate: 2.358E-05 | global batch size: 32 | lm loss: 6.522691E+00 | loss scale: 16384.0 | grad norm: 90819.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39936 |
+
time (ms)
|
39937 |
+
iteration 4138/ 159576 | consumed samples: 85168 | elapsed time per iteration (ms): 14588.1 | learning rate: 2.359E-05 | global batch size: 32 | lm loss: 6.515704E+00 | loss scale: 16384.0 | grad norm: 84641.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39938 |
+
time (ms)
|
39939 |
+
iteration 4139/ 159576 | consumed samples: 85200 | elapsed time per iteration (ms): 14775.7 | learning rate: 2.360E-05 | global batch size: 32 | lm loss: 6.462790E+00 | loss scale: 16384.0 | grad norm: 109335.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39940 |
+
time (ms)
|
39941 |
+
iteration 4140/ 159576 | consumed samples: 85232 | elapsed time per iteration (ms): 14632.9 | learning rate: 2.361E-05 | global batch size: 32 | lm loss: 6.565165E+00 | loss scale: 16384.0 | grad norm: 101408.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39942 |
+
time (ms)
|
39943 |
+
iteration 4141/ 159576 | consumed samples: 85264 | elapsed time per iteration (ms): 14488.2 | learning rate: 2.361E-05 | global batch size: 32 | lm loss: 6.378877E+00 | loss scale: 16384.0 | grad norm: 85177.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39944 |
+
time (ms)
|
39945 |
+
iteration 4142/ 159576 | consumed samples: 85296 | elapsed time per iteration (ms): 14538.0 | learning rate: 2.362E-05 | global batch size: 32 | lm loss: 6.464640E+00 | loss scale: 16384.0 | grad norm: 107413.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39946 |
+
time (ms)
|
39947 |
+
iteration 4143/ 159576 | consumed samples: 85328 | elapsed time per iteration (ms): 14656.2 | learning rate: 2.363E-05 | global batch size: 32 | lm loss: 6.672103E+00 | loss scale: 16384.0 | grad norm: 79187.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39948 |
+
time (ms)
|
39949 |
+
iteration 4144/ 159576 | consumed samples: 85360 | elapsed time per iteration (ms): 14916.7 | learning rate: 2.364E-05 | global batch size: 32 | lm loss: 6.691429E+00 | loss scale: 16384.0 | grad norm: 105292.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39950 |
+
time (ms)
|
39951 |
+
iteration 4145/ 159576 | consumed samples: 85392 | elapsed time per iteration (ms): 14496.1 | learning rate: 2.365E-05 | global batch size: 32 | lm loss: 6.428411E+00 | loss scale: 16384.0 | grad norm: 81232.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39952 |
+
time (ms)
|
39953 |
+
iteration 4146/ 159576 | consumed samples: 85424 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.366E-05 | global batch size: 32 | lm loss: 6.483904E+00 | loss scale: 16384.0 | grad norm: 117143.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39954 |
+
time (ms)
|
39955 |
+
iteration 4147/ 159576 | consumed samples: 85456 | elapsed time per iteration (ms): 14531.1 | learning rate: 2.367E-05 | global batch size: 32 | lm loss: 6.363456E+00 | loss scale: 16384.0 | grad norm: 88860.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39956 |
+
time (ms)
|
39957 |
+
iteration 4148/ 159576 | consumed samples: 85488 | elapsed time per iteration (ms): 14766.7 | learning rate: 2.368E-05 | global batch size: 32 | lm loss: 6.523079E+00 | loss scale: 16384.0 | grad norm: 87677.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39958 |
+
time (ms)
|
39959 |
+
iteration 4149/ 159576 | consumed samples: 85520 | elapsed time per iteration (ms): 14507.2 | learning rate: 2.368E-05 | global batch size: 32 | lm loss: 6.553520E+00 | loss scale: 16384.0 | grad norm: 121742.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39960 |
+
time (ms)
|
39961 |
+
iteration 4150/ 159576 | consumed samples: 85552 | elapsed time per iteration (ms): 14548.6 | learning rate: 2.369E-05 | global batch size: 32 | lm loss: 6.490498E+00 | loss scale: 16384.0 | grad norm: 89599.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39962 |
+
time (ms)
|
39963 |
+
iteration 4151/ 159576 | consumed samples: 85584 | elapsed time per iteration (ms): 14535.8 | learning rate: 2.370E-05 | global batch size: 32 | lm loss: 6.498284E+00 | loss scale: 16384.0 | grad norm: 103857.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39964 |
+
time (ms)
|
39965 |
+
iteration 4152/ 159576 | consumed samples: 85616 | elapsed time per iteration (ms): 14637.7 | learning rate: 2.371E-05 | global batch size: 32 | lm loss: 6.607250E+00 | loss scale: 16384.0 | grad norm: 80792.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39966 |
+
time (ms)
|
39967 |
+
iteration 4153/ 159576 | consumed samples: 85648 | elapsed time per iteration (ms): 14584.8 | learning rate: 2.372E-05 | global batch size: 32 | lm loss: 6.465719E+00 | loss scale: 16384.0 | grad norm: 76852.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39968 |
+
time (ms)
|
39969 |
+
iteration 4154/ 159576 | consumed samples: 85680 | elapsed time per iteration (ms): 14575.3 | learning rate: 2.373E-05 | global batch size: 32 | lm loss: 6.475266E+00 | loss scale: 16384.0 | grad norm: 87775.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39970 |
+
time (ms)
|
39971 |
+
iteration 4155/ 159576 | consumed samples: 85712 | elapsed time per iteration (ms): 14452.5 | learning rate: 2.374E-05 | global batch size: 32 | lm loss: 6.456027E+00 | loss scale: 16384.0 | grad norm: 75377.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39972 |
+
time (ms)
|
39973 |
+
iteration 4156/ 159576 | consumed samples: 85744 | elapsed time per iteration (ms): 14769.4 | learning rate: 2.375E-05 | global batch size: 32 | lm loss: 6.436621E+00 | loss scale: 16384.0 | grad norm: 86270.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39974 |
+
time (ms)
|
39975 |
+
iteration 4157/ 159576 | consumed samples: 85776 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.376E-05 | global batch size: 32 | lm loss: 6.502521E+00 | loss scale: 16384.0 | grad norm: 77291.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39976 |
+
time (ms)
|
39977 |
+
iteration 4158/ 159576 | consumed samples: 85808 | elapsed time per iteration (ms): 14605.4 | learning rate: 2.376E-05 | global batch size: 32 | lm loss: 6.271915E+00 | loss scale: 16384.0 | grad norm: 79782.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39978 |
+
time (ms)
|
39979 |
+
iteration 4159/ 159576 | consumed samples: 85840 | elapsed time per iteration (ms): 14468.5 | learning rate: 2.377E-05 | global batch size: 32 | lm loss: 6.375775E+00 | loss scale: 16384.0 | grad norm: 91679.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39980 |
+
time (ms)
|
39981 |
+
iteration 4160/ 159576 | consumed samples: 85872 | elapsed time per iteration (ms): 15055.2 | learning rate: 2.378E-05 | global batch size: 32 | lm loss: 6.207356E+00 | loss scale: 16384.0 | grad norm: 84700.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39982 |
+
time (ms)
|
39983 |
+
iteration 4161/ 159576 | consumed samples: 85904 | elapsed time per iteration (ms): 14639.9 | learning rate: 2.379E-05 | global batch size: 32 | lm loss: 6.385208E+00 | loss scale: 16384.0 | grad norm: 77383.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39984 |
+
time (ms)
|
39985 |
+
iteration 4162/ 159576 | consumed samples: 85936 | elapsed time per iteration (ms): 14461.5 | learning rate: 2.380E-05 | global batch size: 32 | lm loss: 6.480938E+00 | loss scale: 16384.0 | grad norm: 98154.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39986 |
+
time (ms)
|
39987 |
+
iteration 4163/ 159576 | consumed samples: 85968 | elapsed time per iteration (ms): 14557.2 | learning rate: 2.381E-05 | global batch size: 32 | lm loss: 6.427241E+00 | loss scale: 16384.0 | grad norm: 79663.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39988 |
+
time (ms)
|
39989 |
+
iteration 4164/ 159576 | consumed samples: 86000 | elapsed time per iteration (ms): 15046.3 | learning rate: 2.382E-05 | global batch size: 32 | lm loss: 6.310709E+00 | loss scale: 16384.0 | grad norm: 76469.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39990 |
+
time (ms)
|
39991 |
+
iteration 4165/ 159576 | consumed samples: 86032 | elapsed time per iteration (ms): 14517.1 | learning rate: 2.383E-05 | global batch size: 32 | lm loss: 6.597423E+00 | loss scale: 16384.0 | grad norm: 95179.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39992 |
+
time (ms)
|
39993 |
+
iteration 4166/ 159576 | consumed samples: 86064 | elapsed time per iteration (ms): 14562.4 | learning rate: 2.384E-05 | global batch size: 32 | lm loss: 6.398317E+00 | loss scale: 16384.0 | grad norm: 86889.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39994 |
+
time (ms)
|
39995 |
+
iteration 4167/ 159576 | consumed samples: 86096 | elapsed time per iteration (ms): 14577.1 | learning rate: 2.384E-05 | global batch size: 32 | lm loss: 6.447660E+00 | loss scale: 16384.0 | grad norm: 99510.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39996 |
+
time (ms)
|
39997 |
+
iteration 4168/ 159576 | consumed samples: 86128 | elapsed time per iteration (ms): 14813.0 | learning rate: 2.385E-05 | global batch size: 32 | lm loss: 6.528482E+00 | loss scale: 16384.0 | grad norm: 83413.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39998 |
+
time (ms)
|
39999 |
+
iteration 4169/ 159576 | consumed samples: 86160 | elapsed time per iteration (ms): 14589.9 | learning rate: 2.386E-05 | global batch size: 32 | lm loss: 6.388697E+00 | loss scale: 16384.0 | grad norm: 76722.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40000 |
+
time (ms)
|
40001 |
+
iteration 4170/ 159576 | consumed samples: 86192 | elapsed time per iteration (ms): 14519.5 | learning rate: 2.387E-05 | global batch size: 32 | lm loss: 6.446240E+00 | loss scale: 16384.0 | grad norm: 85947.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40002 |
+
time (ms)
|
40003 |
+
iteration 4171/ 159576 | consumed samples: 86224 | elapsed time per iteration (ms): 14524.6 | learning rate: 2.388E-05 | global batch size: 32 | lm loss: 6.425363E+00 | loss scale: 16384.0 | grad norm: 88474.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40004 |
+
time (ms)
|
40005 |
+
iteration 4172/ 159576 | consumed samples: 86256 | elapsed time per iteration (ms): 14879.2 | learning rate: 2.389E-05 | global batch size: 32 | lm loss: 6.515138E+00 | loss scale: 16384.0 | grad norm: 108134.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40006 |
+
time (ms)
|
40007 |
+
iteration 4173/ 159576 | consumed samples: 86288 | elapsed time per iteration (ms): 14582.3 | learning rate: 2.390E-05 | global batch size: 32 | lm loss: 6.533965E+00 | loss scale: 16384.0 | grad norm: 76749.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40008 |
+
time (ms)
|
40009 |
+
iteration 4174/ 159576 | consumed samples: 86320 | elapsed time per iteration (ms): 14543.3 | learning rate: 2.391E-05 | global batch size: 32 | lm loss: 6.448212E+00 | loss scale: 16384.0 | grad norm: 93972.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40010 |
+
time (ms)
|
40011 |
+
iteration 4175/ 159576 | consumed samples: 86352 | elapsed time per iteration (ms): 14572.0 | learning rate: 2.392E-05 | global batch size: 32 | lm loss: 6.440217E+00 | loss scale: 16384.0 | grad norm: 102291.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40012 |
+
time (ms)
|
40013 |
+
iteration 4176/ 159576 | consumed samples: 86384 | elapsed time per iteration (ms): 14897.3 | learning rate: 2.392E-05 | global batch size: 32 | lm loss: 6.324600E+00 | loss scale: 16384.0 | grad norm: 81057.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40014 |
+
time (ms)
|
40015 |
+
iteration 4177/ 159576 | consumed samples: 86416 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.393E-05 | global batch size: 32 | lm loss: 6.564878E+00 | loss scale: 16384.0 | grad norm: 96270.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40016 |
+
time (ms)
|
40017 |
+
iteration 4178/ 159576 | consumed samples: 86448 | elapsed time per iteration (ms): 14585.7 | learning rate: 2.394E-05 | global batch size: 32 | lm loss: 6.473108E+00 | loss scale: 16384.0 | grad norm: 80498.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40018 |
+
time (ms)
|
40019 |
+
iteration 4179/ 159576 | consumed samples: 86480 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.395E-05 | global batch size: 32 | lm loss: 6.519761E+00 | loss scale: 16384.0 | grad norm: 90509.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40020 |
+
time (ms)
|
40021 |
+
iteration 4180/ 159576 | consumed samples: 86512 | elapsed time per iteration (ms): 14895.7 | learning rate: 2.396E-05 | global batch size: 32 | lm loss: 6.377243E+00 | loss scale: 16384.0 | grad norm: 92370.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40022 |
+
time (ms)
|
40023 |
+
iteration 4181/ 159576 | consumed samples: 86544 | elapsed time per iteration (ms): 14690.0 | learning rate: 2.397E-05 | global batch size: 32 | lm loss: 6.469300E+00 | loss scale: 16384.0 | grad norm: 89492.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40024 |
+
time (ms)
|
40025 |
+
iteration 4182/ 159576 | consumed samples: 86576 | elapsed time per iteration (ms): 14557.6 | learning rate: 2.398E-05 | global batch size: 32 | lm loss: 6.497668E+00 | loss scale: 16384.0 | grad norm: 104899.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40026 |
+
time (ms)
|
40027 |
+
iteration 4183/ 159576 | consumed samples: 86608 | elapsed time per iteration (ms): 14588.2 | learning rate: 2.399E-05 | global batch size: 32 | lm loss: 6.412446E+00 | loss scale: 16384.0 | grad norm: 81267.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40028 |
+
time (ms)
|
40029 |
+
iteration 4184/ 159576 | consumed samples: 86640 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.400E-05 | global batch size: 32 | lm loss: 6.486274E+00 | loss scale: 16384.0 | grad norm: 95404.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40030 |
+
time (ms)
|
40031 |
+
iteration 4185/ 159576 | consumed samples: 86672 | elapsed time per iteration (ms): 14942.6 | learning rate: 2.400E-05 | global batch size: 32 | lm loss: 6.375100E+00 | loss scale: 16384.0 | grad norm: 82372.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40032 |
+
time (ms)
|
40033 |
+
iteration 4186/ 159576 | consumed samples: 86704 | elapsed time per iteration (ms): 14540.4 | learning rate: 2.401E-05 | global batch size: 32 | lm loss: 6.444688E+00 | loss scale: 16384.0 | grad norm: 102268.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40034 |
+
time (ms)
|
40035 |
+
iteration 4187/ 159576 | consumed samples: 86736 | elapsed time per iteration (ms): 14530.9 | learning rate: 2.402E-05 | global batch size: 32 | lm loss: 6.270885E+00 | loss scale: 16384.0 | grad norm: 85114.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40036 |
+
time (ms)
|
40037 |
+
iteration 4188/ 159576 | consumed samples: 86768 | elapsed time per iteration (ms): 14554.4 | learning rate: 2.403E-05 | global batch size: 32 | lm loss: 6.461191E+00 | loss scale: 16384.0 | grad norm: 82795.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40038 |
+
time (ms)
|
40039 |
+
iteration 4189/ 159576 | consumed samples: 86800 | elapsed time per iteration (ms): 14680.7 | learning rate: 2.404E-05 | global batch size: 32 | lm loss: 6.483377E+00 | loss scale: 16384.0 | grad norm: 106142.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40040 |
+
time (ms)
|
40041 |
+
iteration 4190/ 159576 | consumed samples: 86832 | elapsed time per iteration (ms): 14652.1 | learning rate: 2.405E-05 | global batch size: 32 | lm loss: 6.468819E+00 | loss scale: 16384.0 | grad norm: 83557.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40042 |
+
time (ms)
|
40043 |
+
iteration 4191/ 159576 | consumed samples: 86864 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.406E-05 | global batch size: 32 | lm loss: 6.379012E+00 | loss scale: 16384.0 | grad norm: 90619.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40044 |
+
time (ms)
|
40045 |
+
iteration 4192/ 159576 | consumed samples: 86896 | elapsed time per iteration (ms): 14539.1 | learning rate: 2.407E-05 | global batch size: 32 | lm loss: 6.459314E+00 | loss scale: 16384.0 | grad norm: 94282.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40046 |
+
time (ms)
|
40047 |
+
iteration 4193/ 159576 | consumed samples: 86928 | elapsed time per iteration (ms): 14715.7 | learning rate: 2.408E-05 | global batch size: 32 | lm loss: 6.435170E+00 | loss scale: 16384.0 | grad norm: 92946.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40048 |
+
time (ms)
|
40049 |
+
iteration 4194/ 159576 | consumed samples: 86960 | elapsed time per iteration (ms): 14501.7 | learning rate: 2.408E-05 | global batch size: 32 | lm loss: 6.419791E+00 | loss scale: 16384.0 | grad norm: 78251.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40050 |
+
time (ms)
|
40051 |
+
iteration 4195/ 159576 | consumed samples: 86992 | elapsed time per iteration (ms): 14523.0 | learning rate: 2.409E-05 | global batch size: 32 | lm loss: 6.342591E+00 | loss scale: 16384.0 | grad norm: 80571.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40052 |
+
time (ms)
|
40053 |
+
iteration 4196/ 159576 | consumed samples: 87024 | elapsed time per iteration (ms): 14595.3 | learning rate: 2.410E-05 | global batch size: 32 | lm loss: 6.373145E+00 | loss scale: 16384.0 | grad norm: 106409.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40054 |
+
time (ms)
|
40055 |
+
iteration 4197/ 159576 | consumed samples: 87056 | elapsed time per iteration (ms): 14737.5 | learning rate: 2.411E-05 | global batch size: 32 | lm loss: 6.543087E+00 | loss scale: 16384.0 | grad norm: 81359.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40056 |
+
time (ms)
|
40057 |
+
iteration 4198/ 159576 | consumed samples: 87088 | elapsed time per iteration (ms): 14570.3 | learning rate: 2.412E-05 | global batch size: 32 | lm loss: 6.555972E+00 | loss scale: 16384.0 | grad norm: 101442.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40058 |
+
time (ms)
|
40059 |
+
iteration 4199/ 159576 | consumed samples: 87120 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.413E-05 | global batch size: 32 | lm loss: 6.497987E+00 | loss scale: 16384.0 | grad norm: 87789.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40060 |
+
time (ms)
|
40061 |
+
iteration 4200/ 159576 | consumed samples: 87152 | elapsed time per iteration (ms): 14561.0 | learning rate: 2.414E-05 | global batch size: 32 | lm loss: 6.526636E+00 | loss scale: 16384.0 | grad norm: 97375.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40062 |
+
time (ms)
|
40063 |
+
iteration 4201/ 159576 | consumed samples: 87184 | elapsed time per iteration (ms): 14967.8 | learning rate: 2.415E-05 | global batch size: 32 | lm loss: 6.529594E+00 | loss scale: 16384.0 | grad norm: 98056.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40064 |
+
time (ms)
|
40065 |
+
iteration 4202/ 159576 | consumed samples: 87216 | elapsed time per iteration (ms): 14591.5 | learning rate: 2.416E-05 | global batch size: 32 | lm loss: 6.461559E+00 | loss scale: 16384.0 | grad norm: 103248.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40066 |
+
time (ms)
|
40067 |
+
iteration 4203/ 159576 | consumed samples: 87248 | elapsed time per iteration (ms): 14557.3 | learning rate: 2.416E-05 | global batch size: 32 | lm loss: 6.255905E+00 | loss scale: 16384.0 | grad norm: 98489.984 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40068 |
+
time (ms)
|
40069 |
+
iteration 4204/ 159576 | consumed samples: 87280 | elapsed time per iteration (ms): 14539.8 | learning rate: 2.417E-05 | global batch size: 32 | lm loss: 6.456792E+00 | loss scale: 16384.0 | grad norm: 90220.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40070 |
+
time (ms)
|
40071 |
+
iteration 4205/ 159576 | consumed samples: 87312 | elapsed time per iteration (ms): 14936.2 | learning rate: 2.418E-05 | global batch size: 32 | lm loss: 6.456956E+00 | loss scale: 16384.0 | grad norm: 99591.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40072 |
+
time (ms)
|
40073 |
+
iteration 4206/ 159576 | consumed samples: 87344 | elapsed time per iteration (ms): 14602.1 | learning rate: 2.419E-05 | global batch size: 32 | lm loss: 6.539675E+00 | loss scale: 16384.0 | grad norm: 106461.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40074 |
+
time (ms)
|
40075 |
+
iteration 4207/ 159576 | consumed samples: 87376 | elapsed time per iteration (ms): 14518.5 | learning rate: 2.420E-05 | global batch size: 32 | lm loss: 6.581583E+00 | loss scale: 16384.0 | grad norm: 104474.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40076 |
+
time (ms)
|
40077 |
+
iteration 4208/ 159576 | consumed samples: 87408 | elapsed time per iteration (ms): 14546.2 | learning rate: 2.421E-05 | global batch size: 32 | lm loss: 6.470299E+00 | loss scale: 16384.0 | grad norm: 103936.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40078 |
+
time (ms)
|
40079 |
+
iteration 4209/ 159576 | consumed samples: 87440 | elapsed time per iteration (ms): 14895.0 | learning rate: 2.422E-05 | global batch size: 32 | lm loss: 6.485046E+00 | loss scale: 16384.0 | grad norm: 103480.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40080 |
+
time (ms)
|
40081 |
+
iteration 4210/ 159576 | consumed samples: 87472 | elapsed time per iteration (ms): 14490.7 | learning rate: 2.423E-05 | global batch size: 32 | lm loss: 6.331614E+00 | loss scale: 16384.0 | grad norm: 92393.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40082 |
+
time (ms)
|
40083 |
+
iteration 4211/ 159576 | consumed samples: 87504 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.424E-05 | global batch size: 32 | lm loss: 6.343493E+00 | loss scale: 16384.0 | grad norm: 138840.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40084 |
+
time (ms)
|
40085 |
+
iteration 4212/ 159576 | consumed samples: 87536 | elapsed time per iteration (ms): 14559.8 | learning rate: 2.424E-05 | global batch size: 32 | lm loss: 6.362164E+00 | loss scale: 16384.0 | grad norm: 105314.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40086 |
+
time (ms)
|
40087 |
+
iteration 4213/ 159576 | consumed samples: 87568 | elapsed time per iteration (ms): 14962.7 | learning rate: 2.425E-05 | global batch size: 32 | lm loss: 6.413978E+00 | loss scale: 16384.0 | grad norm: 100396.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40088 |
+
time (ms)
|
40089 |
+
iteration 4214/ 159576 | consumed samples: 87600 | elapsed time per iteration (ms): 14459.8 | learning rate: 2.426E-05 | global batch size: 32 | lm loss: 6.333343E+00 | loss scale: 16384.0 | grad norm: 101809.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40090 |
+
time (ms)
|
40091 |
+
iteration 4215/ 159576 | consumed samples: 87632 | elapsed time per iteration (ms): 14541.9 | learning rate: 2.427E-05 | global batch size: 32 | lm loss: 6.552740E+00 | loss scale: 16384.0 | grad norm: 198031.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40092 |
+
time (ms)
|
40093 |
+
iteration 4216/ 159576 | consumed samples: 87664 | elapsed time per iteration (ms): 14546.7 | learning rate: 2.428E-05 | global batch size: 32 | lm loss: 6.373903E+00 | loss scale: 16384.0 | grad norm: 98034.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40094 |
+
time (ms)
|
40095 |
+
iteration 4217/ 159576 | consumed samples: 87696 | elapsed time per iteration (ms): 14848.3 | learning rate: 2.429E-05 | global batch size: 32 | lm loss: 6.452424E+00 | loss scale: 16384.0 | grad norm: 267522.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40096 |
+
time (ms)
|
40097 |
+
iteration 4218/ 159576 | consumed samples: 87728 | elapsed time per iteration (ms): 14570.6 | learning rate: 2.430E-05 | global batch size: 32 | lm loss: 6.493920E+00 | loss scale: 16384.0 | grad norm: 121372.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40098 |
+
time (ms)
|
40099 |
+
iteration 4219/ 159576 | consumed samples: 87760 | elapsed time per iteration (ms): 14553.1 | learning rate: 2.431E-05 | global batch size: 32 | lm loss: 6.478834E+00 | loss scale: 16384.0 | grad norm: 112151.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40100 |
+
time (ms)
|
40101 |
+
iteration 4220/ 159576 | consumed samples: 87792 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.432E-05 | global batch size: 32 | lm loss: 6.452081E+00 | loss scale: 16384.0 | grad norm: 164176.147 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40102 |
+
time (ms)
|
40103 |
+
iteration 4221/ 159576 | consumed samples: 87824 | elapsed time per iteration (ms): 14866.7 | learning rate: 2.432E-05 | global batch size: 32 | lm loss: 6.616721E+00 | loss scale: 16384.0 | grad norm: 88412.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40104 |
+
time (ms)
|
40105 |
+
iteration 4222/ 159576 | consumed samples: 87856 | elapsed time per iteration (ms): 14831.9 | learning rate: 2.433E-05 | global batch size: 32 | lm loss: 6.396004E+00 | loss scale: 16384.0 | grad norm: 116548.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40106 |
+
time (ms)
|
40107 |
+
iteration 4223/ 159576 | consumed samples: 87888 | elapsed time per iteration (ms): 14530.1 | learning rate: 2.434E-05 | global batch size: 32 | lm loss: 6.223457E+00 | loss scale: 16384.0 | grad norm: 151936.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40108 |
+
time (ms)
|
40109 |
+
iteration 4224/ 159576 | consumed samples: 87920 | elapsed time per iteration (ms): 14526.4 | learning rate: 2.435E-05 | global batch size: 32 | lm loss: 6.471479E+00 | loss scale: 16384.0 | grad norm: 107150.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40110 |
+
time (ms)
|
40111 |
+
iteration 4225/ 159576 | consumed samples: 87952 | elapsed time per iteration (ms): 14556.3 | learning rate: 2.436E-05 | global batch size: 32 | lm loss: 6.420123E+00 | loss scale: 16384.0 | grad norm: 118336.101 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40112 |
+
time (ms)
|
40113 |
+
iteration 4226/ 159576 | consumed samples: 87984 | elapsed time per iteration (ms): 14779.5 | learning rate: 2.437E-05 | global batch size: 32 | lm loss: 6.463729E+00 | loss scale: 16384.0 | grad norm: 105104.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40114 |
+
time (ms)
|
40115 |
+
iteration 4227/ 159576 | consumed samples: 88016 | elapsed time per iteration (ms): 14616.1 | learning rate: 2.438E-05 | global batch size: 32 | lm loss: 6.384348E+00 | loss scale: 16384.0 | grad norm: 121857.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40116 |
+
time (ms)
|
40117 |
+
iteration 4228/ 159576 | consumed samples: 88048 | elapsed time per iteration (ms): 14595.0 | learning rate: 2.439E-05 | global batch size: 32 | lm loss: 6.562186E+00 | loss scale: 16384.0 | grad norm: 120895.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40118 |
+
time (ms)
|
40119 |
+
iteration 4229/ 159576 | consumed samples: 88080 | elapsed time per iteration (ms): 14592.9 | learning rate: 2.439E-05 | global batch size: 32 | lm loss: 6.614166E+00 | loss scale: 16384.0 | grad norm: 141989.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40120 |
+
time (ms)
|
40121 |
+
iteration 4230/ 159576 | consumed samples: 88112 | elapsed time per iteration (ms): 14745.8 | learning rate: 2.440E-05 | global batch size: 32 | lm loss: 6.416856E+00 | loss scale: 16384.0 | grad norm: 135385.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40122 |
+
time (ms)
|
40123 |
+
iteration 4231/ 159576 | consumed samples: 88144 | elapsed time per iteration (ms): 14547.3 | learning rate: 2.441E-05 | global batch size: 32 | lm loss: 6.576384E+00 | loss scale: 16384.0 | grad norm: 129034.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40124 |
+
time (ms)
|
40125 |
+
iteration 4232/ 159576 | consumed samples: 88176 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.442E-05 | global batch size: 32 | lm loss: 6.371499E+00 | loss scale: 16384.0 | grad norm: 102463.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40126 |
+
time (ms)
|
40127 |
+
iteration 4233/ 159576 | consumed samples: 88208 | elapsed time per iteration (ms): 14580.8 | learning rate: 2.443E-05 | global batch size: 32 | lm loss: 6.598085E+00 | loss scale: 16384.0 | grad norm: 105075.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40128 |
+
time (ms)
|
40129 |
+
iteration 4234/ 159576 | consumed samples: 88240 | elapsed time per iteration (ms): 14766.2 | learning rate: 2.444E-05 | global batch size: 32 | lm loss: 6.536204E+00 | loss scale: 16384.0 | grad norm: 109004.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40130 |
+
time (ms)
|
40131 |
+
iteration 4235/ 159576 | consumed samples: 88272 | elapsed time per iteration (ms): 14518.0 | learning rate: 2.445E-05 | global batch size: 32 | lm loss: 6.663161E+00 | loss scale: 16384.0 | grad norm: 197099.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40132 |
+
time (ms)
|
40133 |
+
iteration 4236/ 159576 | consumed samples: 88304 | elapsed time per iteration (ms): 14598.2 | learning rate: 2.446E-05 | global batch size: 32 | lm loss: 6.451008E+00 | loss scale: 16384.0 | grad norm: 125746.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40134 |
+
time (ms)
|
40135 |
+
iteration 4237/ 159576 | consumed samples: 88336 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.447E-05 | global batch size: 32 | lm loss: 6.306778E+00 | loss scale: 16384.0 | grad norm: 145717.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40136 |
+
time (ms)
|
40137 |
+
iteration 4238/ 159576 | consumed samples: 88368 | elapsed time per iteration (ms): 14844.4 | learning rate: 2.447E-05 | global batch size: 32 | lm loss: 6.637146E+00 | loss scale: 16384.0 | grad norm: 161986.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40138 |
+
time (ms)
|
40139 |
+
iteration 4239/ 159576 | consumed samples: 88400 | elapsed time per iteration (ms): 14550.6 | learning rate: 2.448E-05 | global batch size: 32 | lm loss: 6.518569E+00 | loss scale: 16384.0 | grad norm: 114815.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40140 |
+
time (ms)
|
40141 |
+
iteration 4240/ 159576 | consumed samples: 88432 | elapsed time per iteration (ms): 14540.5 | learning rate: 2.449E-05 | global batch size: 32 | lm loss: 6.644086E+00 | loss scale: 16384.0 | grad norm: 127083.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40142 |
+
time (ms)
|
40143 |
+
iteration 4241/ 159576 | consumed samples: 88464 | elapsed time per iteration (ms): 14556.9 | learning rate: 2.450E-05 | global batch size: 32 | lm loss: 6.359149E+00 | loss scale: 16384.0 | grad norm: 119916.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40144 |
+
time (ms)
|
40145 |
+
iteration 4242/ 159576 | consumed samples: 88496 | elapsed time per iteration (ms): 14950.3 | learning rate: 2.451E-05 | global batch size: 32 | lm loss: 6.517668E+00 | loss scale: 16384.0 | grad norm: 116850.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40146 |
+
time (ms)
|
40147 |
+
iteration 4243/ 159576 | consumed samples: 88528 | elapsed time per iteration (ms): 14575.9 | learning rate: 2.452E-05 | global batch size: 32 | lm loss: 6.345152E+00 | loss scale: 16384.0 | grad norm: 106829.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40148 |
+
time (ms)
|
40149 |
+
iteration 4244/ 159576 | consumed samples: 88560 | elapsed time per iteration (ms): 14588.0 | learning rate: 2.453E-05 | global batch size: 32 | lm loss: 6.476923E+00 | loss scale: 16384.0 | grad norm: 121409.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40150 |
+
time (ms)
|
40151 |
+
iteration 4245/ 159576 | consumed samples: 88592 | elapsed time per iteration (ms): 14539.0 | learning rate: 2.454E-05 | global batch size: 32 | lm loss: 6.428369E+00 | loss scale: 16384.0 | grad norm: 99872.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40152 |
+
time (ms)
|
40153 |
+
iteration 4246/ 159576 | consumed samples: 88624 | elapsed time per iteration (ms): 15044.1 | learning rate: 2.455E-05 | global batch size: 32 | lm loss: 6.447415E+00 | loss scale: 16384.0 | grad norm: 102765.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40154 |
+
time (ms)
|
40155 |
+
iteration 4247/ 159576 | consumed samples: 88656 | elapsed time per iteration (ms): 14546.9 | learning rate: 2.455E-05 | global batch size: 32 | lm loss: 6.336578E+00 | loss scale: 16384.0 | grad norm: 90835.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40156 |
+
time (ms)
|
40157 |
+
iteration 4248/ 159576 | consumed samples: 88688 | elapsed time per iteration (ms): 14540.1 | learning rate: 2.456E-05 | global batch size: 32 | lm loss: 6.555513E+00 | loss scale: 16384.0 | grad norm: 104407.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40158 |
+
time (ms)
|
40159 |
+
iteration 4249/ 159576 | consumed samples: 88720 | elapsed time per iteration (ms): 14613.4 | learning rate: 2.457E-05 | global batch size: 32 | lm loss: 6.546042E+00 | loss scale: 16384.0 | grad norm: 115379.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40160 |
+
time (ms)
|
40161 |
+
iteration 4250/ 159576 | consumed samples: 88752 | elapsed time per iteration (ms): 14829.6 | learning rate: 2.458E-05 | global batch size: 32 | lm loss: 6.436588E+00 | loss scale: 16384.0 | grad norm: 107293.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40162 |
+
time (ms)
|
40163 |
+
iteration 4251/ 159576 | consumed samples: 88784 | elapsed time per iteration (ms): 14544.9 | learning rate: 2.459E-05 | global batch size: 32 | lm loss: 6.438442E+00 | loss scale: 16384.0 | grad norm: 105034.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40164 |
+
time (ms)
|
40165 |
+
iteration 4252/ 159576 | consumed samples: 88816 | elapsed time per iteration (ms): 14563.6 | learning rate: 2.460E-05 | global batch size: 32 | lm loss: 6.473608E+00 | loss scale: 16384.0 | grad norm: 84036.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40166 |
+
time (ms)
|
40167 |
+
iteration 4253/ 159576 | consumed samples: 88848 | elapsed time per iteration (ms): 14528.1 | learning rate: 2.461E-05 | global batch size: 32 | lm loss: 6.422614E+00 | loss scale: 16384.0 | grad norm: 95068.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40168 |
+
time (ms)
|
40169 |
+
iteration 4254/ 159576 | consumed samples: 88880 | elapsed time per iteration (ms): 14918.1 | learning rate: 2.462E-05 | global batch size: 32 | lm loss: 6.295578E+00 | loss scale: 16384.0 | grad norm: 114489.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40170 |
+
time (ms)
|
40171 |
+
iteration 4255/ 159576 | consumed samples: 88912 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.463E-05 | global batch size: 32 | lm loss: 6.416272E+00 | loss scale: 16384.0 | grad norm: 91261.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40172 |
+
time (ms)
|
40173 |
+
iteration 4256/ 159576 | consumed samples: 88944 | elapsed time per iteration (ms): 14525.5 | learning rate: 2.463E-05 | global batch size: 32 | lm loss: 6.517479E+00 | loss scale: 32768.0 | grad norm: 94254.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40174 |
+
time (ms)
|
40175 |
+
iteration 4257/ 159576 | consumed samples: 88976 | elapsed time per iteration (ms): 14555.5 | learning rate: 2.464E-05 | global batch size: 32 | lm loss: 6.469455E+00 | loss scale: 32768.0 | grad norm: 174372.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40176 |
+
time (ms)
|
40177 |
+
iteration 4258/ 159576 | consumed samples: 89008 | elapsed time per iteration (ms): 14928.2 | learning rate: 2.465E-05 | global batch size: 32 | lm loss: 6.408867E+00 | loss scale: 32768.0 | grad norm: 205212.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40178 |
+
time (ms)
|
40179 |
+
iteration 4259/ 159576 | consumed samples: 89040 | elapsed time per iteration (ms): 14529.5 | learning rate: 2.466E-05 | global batch size: 32 | lm loss: 6.518348E+00 | loss scale: 32768.0 | grad norm: 175125.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40180 |
+
time (ms)
|
40181 |
+
iteration 4260/ 159576 | consumed samples: 89072 | elapsed time per iteration (ms): 14608.9 | learning rate: 2.467E-05 | global batch size: 32 | lm loss: 6.456366E+00 | loss scale: 32768.0 | grad norm: 180925.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40182 |
+
time (ms)
|
40183 |
+
iteration 4261/ 159576 | consumed samples: 89104 | elapsed time per iteration (ms): 14541.2 | learning rate: 2.468E-05 | global batch size: 32 | lm loss: 6.688640E+00 | loss scale: 32768.0 | grad norm: 205129.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40184 |
+
time (ms)
|
40185 |
+
iteration 4262/ 159576 | consumed samples: 89136 | elapsed time per iteration (ms): 14984.8 | learning rate: 2.469E-05 | global batch size: 32 | lm loss: 6.381848E+00 | loss scale: 32768.0 | grad norm: 194086.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40186 |
+
time (ms)
|
40187 |
+
iteration 4263/ 159576 | consumed samples: 89168 | elapsed time per iteration (ms): 14627.4 | learning rate: 2.470E-05 | global batch size: 32 | lm loss: 6.325251E+00 | loss scale: 32768.0 | grad norm: 200329.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40188 |
+
time (ms)
|
40189 |
+
iteration 4264/ 159576 | consumed samples: 89200 | elapsed time per iteration (ms): 14514.4 | learning rate: 2.471E-05 | global batch size: 32 | lm loss: 6.384187E+00 | loss scale: 32768.0 | grad norm: 206513.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40190 |
+
time (ms)
|
40191 |
+
iteration 4265/ 159576 | consumed samples: 89232 | elapsed time per iteration (ms): 14532.8 | learning rate: 2.471E-05 | global batch size: 32 | lm loss: 6.524798E+00 | loss scale: 32768.0 | grad norm: 207588.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40192 |
+
time (ms)
|
40193 |
+
iteration 4266/ 159576 | consumed samples: 89264 | elapsed time per iteration (ms): 14499.0 | learning rate: 2.472E-05 | global batch size: 32 | lm loss: 6.427965E+00 | loss scale: 32768.0 | grad norm: 270396.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40194 |
+
time (ms)
|
40195 |
+
iteration 4267/ 159576 | consumed samples: 89296 | elapsed time per iteration (ms): 14964.3 | learning rate: 2.473E-05 | global batch size: 32 | lm loss: 6.508441E+00 | loss scale: 32768.0 | grad norm: 256825.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40196 |
+
time (ms)
|
40197 |
+
iteration 4268/ 159576 | consumed samples: 89328 | elapsed time per iteration (ms): 14573.4 | learning rate: 2.474E-05 | global batch size: 32 | lm loss: 6.281446E+00 | loss scale: 32768.0 | grad norm: 175050.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40198 |
+
time (ms)
|
40199 |
+
iteration 4269/ 159576 | consumed samples: 89360 | elapsed time per iteration (ms): 14497.3 | learning rate: 2.475E-05 | global batch size: 32 | lm loss: 6.477619E+00 | loss scale: 32768.0 | grad norm: 194699.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40200 |
+
time (ms)
|
40201 |
+
iteration 4270/ 159576 | consumed samples: 89392 | elapsed time per iteration (ms): 14560.8 | learning rate: 2.476E-05 | global batch size: 32 | lm loss: 6.521669E+00 | loss scale: 32768.0 | grad norm: 204025.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40202 |
+
time (ms)
|
40203 |
+
iteration 4271/ 159576 | consumed samples: 89424 | elapsed time per iteration (ms): 14634.9 | learning rate: 2.477E-05 | global batch size: 32 | lm loss: 6.532991E+00 | loss scale: 32768.0 | grad norm: 218350.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40204 |
+
time (ms)
|
40205 |
+
iteration 4272/ 159576 | consumed samples: 89456 | elapsed time per iteration (ms): 14566.6 | learning rate: 2.478E-05 | global batch size: 32 | lm loss: 6.491451E+00 | loss scale: 32768.0 | grad norm: 196213.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40206 |
+
time (ms)
|
40207 |
+
iteration 4273/ 159576 | consumed samples: 89488 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.479E-05 | global batch size: 32 | lm loss: 6.527338E+00 | loss scale: 32768.0 | grad norm: 254430.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40208 |
+
time (ms)
|
40209 |
+
iteration 4274/ 159576 | consumed samples: 89520 | elapsed time per iteration (ms): 14538.5 | learning rate: 2.479E-05 | global batch size: 32 | lm loss: 6.303001E+00 | loss scale: 32768.0 | grad norm: 189173.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40210 |
+
time (ms)
|
40211 |
+
iteration 4275/ 159576 | consumed samples: 89552 | elapsed time per iteration (ms): 14691.4 | learning rate: 2.480E-05 | global batch size: 32 | lm loss: 6.465518E+00 | loss scale: 32768.0 | grad norm: 266867.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40212 |
+
time (ms)
|
40213 |
+
iteration 4276/ 159576 | consumed samples: 89584 | elapsed time per iteration (ms): 14571.4 | learning rate: 2.481E-05 | global batch size: 32 | lm loss: 6.562708E+00 | loss scale: 32768.0 | grad norm: 213181.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40214 |
+
time (ms)
|
40215 |
+
iteration 4277/ 159576 | consumed samples: 89616 | elapsed time per iteration (ms): 14513.3 | learning rate: 2.482E-05 | global batch size: 32 | lm loss: 6.490031E+00 | loss scale: 32768.0 | grad norm: 200238.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40216 |
+
time (ms)
|
40217 |
+
iteration 4278/ 159576 | consumed samples: 89648 | elapsed time per iteration (ms): 14545.3 | learning rate: 2.483E-05 | global batch size: 32 | lm loss: 6.452188E+00 | loss scale: 32768.0 | grad norm: 209603.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40218 |
+
time (ms)
|
40219 |
+
iteration 4279/ 159576 | consumed samples: 89680 | elapsed time per iteration (ms): 14892.6 | learning rate: 2.484E-05 | global batch size: 32 | lm loss: 6.402837E+00 | loss scale: 32768.0 | grad norm: 213512.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40220 |
+
time (ms)
|
40221 |
+
iteration 4280/ 159576 | consumed samples: 89712 | elapsed time per iteration (ms): 14552.6 | learning rate: 2.485E-05 | global batch size: 32 | lm loss: 6.481530E+00 | loss scale: 32768.0 | grad norm: 218939.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40222 |
+
time (ms)
|
40223 |
+
iteration 4281/ 159576 | consumed samples: 89744 | elapsed time per iteration (ms): 14525.9 | learning rate: 2.486E-05 | global batch size: 32 | lm loss: 6.481557E+00 | loss scale: 32768.0 | grad norm: 211553.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40224 |
+
time (ms)
|
40225 |
+
iteration 4282/ 159576 | consumed samples: 89776 | elapsed time per iteration (ms): 14536.1 | learning rate: 2.487E-05 | global batch size: 32 | lm loss: 6.396571E+00 | loss scale: 32768.0 | grad norm: 200119.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40226 |
+
time (ms)
|
40227 |
+
iteration 4283/ 159576 | consumed samples: 89808 | elapsed time per iteration (ms): 14897.4 | learning rate: 2.487E-05 | global batch size: 32 | lm loss: 6.437448E+00 | loss scale: 32768.0 | grad norm: 211733.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40228 |
+
time (ms)
|
40229 |
+
iteration 4284/ 159576 | consumed samples: 89840 | elapsed time per iteration (ms): 14635.9 | learning rate: 2.488E-05 | global batch size: 32 | lm loss: 6.477830E+00 | loss scale: 32768.0 | grad norm: 273937.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40230 |
+
time (ms)
|
40231 |
+
iteration 4285/ 159576 | consumed samples: 89872 | elapsed time per iteration (ms): 14565.4 | learning rate: 2.489E-05 | global batch size: 32 | lm loss: 6.567824E+00 | loss scale: 32768.0 | grad norm: 210402.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40232 |
+
time (ms)
|
40233 |
+
iteration 4286/ 159576 | consumed samples: 89904 | elapsed time per iteration (ms): 14519.6 | learning rate: 2.490E-05 | global batch size: 32 | lm loss: 6.385768E+00 | loss scale: 32768.0 | grad norm: 203200.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40234 |
+
time (ms)
|
40235 |
+
iteration 4287/ 159576 | consumed samples: 89936 | elapsed time per iteration (ms): 14914.9 | learning rate: 2.491E-05 | global batch size: 32 | lm loss: 6.397992E+00 | loss scale: 32768.0 | grad norm: 182816.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40236 |
+
time (ms)
|
40237 |
+
iteration 4288/ 159576 | consumed samples: 89968 | elapsed time per iteration (ms): 14476.6 | learning rate: 2.492E-05 | global batch size: 32 | lm loss: 6.388610E+00 | loss scale: 32768.0 | grad norm: 199735.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40238 |
+
time (ms)
|
40239 |
+
iteration 4289/ 159576 | consumed samples: 90000 | elapsed time per iteration (ms): 14570.5 | learning rate: 2.493E-05 | global batch size: 32 | lm loss: 6.506209E+00 | loss scale: 32768.0 | grad norm: 206990.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40240 |
+
time (ms)
|
40241 |
+
iteration 4290/ 159576 | consumed samples: 90032 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.494E-05 | global batch size: 32 | lm loss: 6.351604E+00 | loss scale: 32768.0 | grad norm: 204481.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40242 |
+
time (ms)
|
40243 |
+
iteration 4291/ 159576 | consumed samples: 90064 | elapsed time per iteration (ms): 14860.6 | learning rate: 2.495E-05 | global batch size: 32 | lm loss: 6.518882E+00 | loss scale: 32768.0 | grad norm: 236219.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40244 |
+
time (ms)
|
40245 |
+
iteration 4292/ 159576 | consumed samples: 90096 | elapsed time per iteration (ms): 14581.4 | learning rate: 2.495E-05 | global batch size: 32 | lm loss: 6.428777E+00 | loss scale: 32768.0 | grad norm: 187907.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40246 |
+
time (ms)
|
40247 |
+
iteration 4293/ 159576 | consumed samples: 90128 | elapsed time per iteration (ms): 14508.1 | learning rate: 2.496E-05 | global batch size: 32 | lm loss: 6.327142E+00 | loss scale: 32768.0 | grad norm: 204872.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40248 |
+
time (ms)
|
40249 |
+
iteration 4294/ 159576 | consumed samples: 90160 | elapsed time per iteration (ms): 14534.7 | learning rate: 2.497E-05 | global batch size: 32 | lm loss: 6.385339E+00 | loss scale: 32768.0 | grad norm: 233375.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40250 |
+
time (ms)
|
40251 |
+
iteration 4295/ 159576 | consumed samples: 90192 | elapsed time per iteration (ms): 14858.3 | learning rate: 2.498E-05 | global batch size: 32 | lm loss: 6.416627E+00 | loss scale: 32768.0 | grad norm: 222806.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40252 |
+
time (ms)
|
40253 |
+
iteration 4296/ 159576 | consumed samples: 90224 | elapsed time per iteration (ms): 14474.6 | learning rate: 2.499E-05 | global batch size: 32 | lm loss: 6.518059E+00 | loss scale: 32768.0 | grad norm: 226593.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40254 |
+
time (ms)
|
40255 |
+
iteration 4297/ 159576 | consumed samples: 90256 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.500E-05 | global batch size: 32 | lm loss: 6.133147E+00 | loss scale: 32768.0 | grad norm: 267419.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40256 |
+
time (ms)
|
40257 |
+
iteration 4298/ 159576 | consumed samples: 90288 | elapsed time per iteration (ms): 14566.4 | learning rate: 2.501E-05 | global batch size: 32 | lm loss: 6.308548E+00 | loss scale: 32768.0 | grad norm: 204598.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40258 |
+
time (ms)
|
40259 |
+
iteration 4299/ 159576 | consumed samples: 90320 | elapsed time per iteration (ms): 14984.7 | learning rate: 2.502E-05 | global batch size: 32 | lm loss: 6.369866E+00 | loss scale: 32768.0 | grad norm: 221545.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40260 |
+
time (ms)
|
40261 |
+
iteration 4300/ 159576 | consumed samples: 90352 | elapsed time per iteration (ms): 14484.6 | learning rate: 2.503E-05 | global batch size: 32 | lm loss: 6.530766E+00 | loss scale: 32768.0 | grad norm: 267800.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40262 |
+
time (ms)
|
40263 |
+
iteration 4301/ 159576 | consumed samples: 90384 | elapsed time per iteration (ms): 14557.5 | learning rate: 2.503E-05 | global batch size: 32 | lm loss: 6.503004E+00 | loss scale: 32768.0 | grad norm: 228461.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40264 |
+
time (ms)
|
40265 |
+
iteration 4302/ 159576 | consumed samples: 90416 | elapsed time per iteration (ms): 14550.0 | learning rate: 2.504E-05 | global batch size: 32 | lm loss: 6.538440E+00 | loss scale: 32768.0 | grad norm: 190026.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40266 |
+
time (ms)
|
40267 |
+
iteration 4303/ 159576 | consumed samples: 90448 | elapsed time per iteration (ms): 14655.7 | learning rate: 2.505E-05 | global batch size: 32 | lm loss: 6.461242E+00 | loss scale: 32768.0 | grad norm: 211257.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40268 |
+
time (ms)
|
40269 |
+
iteration 4304/ 159576 | consumed samples: 90480 | elapsed time per iteration (ms): 14769.1 | learning rate: 2.506E-05 | global batch size: 32 | lm loss: 6.479248E+00 | loss scale: 32768.0 | grad norm: 198712.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40270 |
+
time (ms)
|
40271 |
+
iteration 4305/ 159576 | consumed samples: 90512 | elapsed time per iteration (ms): 14577.3 | learning rate: 2.507E-05 | global batch size: 32 | lm loss: 6.432651E+00 | loss scale: 32768.0 | grad norm: 206822.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40272 |
+
time (ms)
|
40273 |
+
iteration 4306/ 159576 | consumed samples: 90544 | elapsed time per iteration (ms): 14533.2 | learning rate: 2.508E-05 | global batch size: 32 | lm loss: 6.347961E+00 | loss scale: 32768.0 | grad norm: 195748.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40274 |
+
time (ms)
|
40275 |
+
iteration 4307/ 159576 | consumed samples: 90576 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.509E-05 | global batch size: 32 | lm loss: 6.507642E+00 | loss scale: 32768.0 | grad norm: 218663.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40276 |
+
time (ms)
|
40277 |
+
iteration 4308/ 159576 | consumed samples: 90608 | elapsed time per iteration (ms): 14732.7 | learning rate: 2.510E-05 | global batch size: 32 | lm loss: 6.541059E+00 | loss scale: 32768.0 | grad norm: 228970.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40278 |
+
time (ms)
|
40279 |
+
iteration 4309/ 159576 | consumed samples: 90640 | elapsed time per iteration (ms): 14469.9 | learning rate: 2.511E-05 | global batch size: 32 | lm loss: 6.424891E+00 | loss scale: 32768.0 | grad norm: 196198.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40280 |
+
time (ms)
|
40281 |
+
iteration 4310/ 159576 | consumed samples: 90672 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.511E-05 | global batch size: 32 | lm loss: 6.490376E+00 | loss scale: 32768.0 | grad norm: 215960.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40282 |
+
time (ms)
|
40283 |
+
iteration 4311/ 159576 | consumed samples: 90704 | elapsed time per iteration (ms): 14508.3 | learning rate: 2.512E-05 | global batch size: 32 | lm loss: 6.488754E+00 | loss scale: 32768.0 | grad norm: 195374.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40284 |
+
time (ms)
|
40285 |
+
iteration 4312/ 159576 | consumed samples: 90736 | elapsed time per iteration (ms): 14753.9 | learning rate: 2.513E-05 | global batch size: 32 | lm loss: 6.448671E+00 | loss scale: 32768.0 | grad norm: 227732.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40286 |
+
time (ms)
|
40287 |
+
iteration 4313/ 159576 | consumed samples: 90768 | elapsed time per iteration (ms): 14571.8 | learning rate: 2.514E-05 | global batch size: 32 | lm loss: 6.500753E+00 | loss scale: 32768.0 | grad norm: 266264.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40288 |
+
time (ms)
|
40289 |
+
iteration 4314/ 159576 | consumed samples: 90800 | elapsed time per iteration (ms): 14601.7 | learning rate: 2.515E-05 | global batch size: 32 | lm loss: 6.454448E+00 | loss scale: 32768.0 | grad norm: 224312.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40290 |
+
time (ms)
|
40291 |
+
iteration 4315/ 159576 | consumed samples: 90832 | elapsed time per iteration (ms): 14520.9 | learning rate: 2.516E-05 | global batch size: 32 | lm loss: 6.340928E+00 | loss scale: 32768.0 | grad norm: 252168.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40292 |
+
time (ms)
|
40293 |
+
iteration 4316/ 159576 | consumed samples: 90864 | elapsed time per iteration (ms): 14650.6 | learning rate: 2.517E-05 | global batch size: 32 | lm loss: 6.524774E+00 | loss scale: 32768.0 | grad norm: 233060.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40294 |
+
time (ms)
|
40295 |
+
iteration 4317/ 159576 | consumed samples: 90896 | elapsed time per iteration (ms): 14507.8 | learning rate: 2.518E-05 | global batch size: 32 | lm loss: 6.526123E+00 | loss scale: 32768.0 | grad norm: 228145.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40296 |
+
time (ms)
|
40297 |
+
iteration 4318/ 159576 | consumed samples: 90928 | elapsed time per iteration (ms): 14505.6 | learning rate: 2.518E-05 | global batch size: 32 | lm loss: 6.554380E+00 | loss scale: 32768.0 | grad norm: 215247.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
40298 |
+
time (ms)
|
40299 |
+
[2021-09-24 19:07:09] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
|
40300 |
+
[2021-09-24 19:07:09] PULSE: tr8-104B is running for 13:14:58 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|