bigscience-bot
commited on
Commit
·
a670b7a
1
Parent(s):
7f31139
new data
Browse files- logs/main_log.txt +497 -0
logs/main_log.txt
CHANGED
@@ -39307,3 +39307,500 @@ time (ms)
|
|
39307 |
[2021-09-24 17:07:03] PULSE: tr8-104B is running for 11:14:52 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|
39308 |
iteration 3827/ 159576 | consumed samples: 75216 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.083E-05 | global batch size: 32 | lm loss: 6.427704E+00 | loss scale: 16384.0 | grad norm: 68943.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39309 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39307 |
[2021-09-24 17:07:03] PULSE: tr8-104B is running for 11:14:52 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|
39308 |
iteration 3827/ 159576 | consumed samples: 75216 | elapsed time per iteration (ms): 14531.9 | learning rate: 2.083E-05 | global batch size: 32 | lm loss: 6.427704E+00 | loss scale: 16384.0 | grad norm: 68943.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39309 |
time (ms)
|
39310 |
+
iteration 3828/ 159576 | consumed samples: 75248 | elapsed time per iteration (ms): 14988.1 | learning rate: 2.084E-05 | global batch size: 32 | lm loss: 6.347779E+00 | loss scale: 16384.0 | grad norm: 64095.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39311 |
+
time (ms)
|
39312 |
+
iteration 3829/ 159576 | consumed samples: 75280 | elapsed time per iteration (ms): 14665.9 | learning rate: 2.084E-05 | global batch size: 32 | lm loss: 6.411919E+00 | loss scale: 16384.0 | grad norm: 82008.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39313 |
+
time (ms)
|
39314 |
+
iteration 3830/ 159576 | consumed samples: 75312 | elapsed time per iteration (ms): 14539.9 | learning rate: 2.085E-05 | global batch size: 32 | lm loss: 6.458866E+00 | loss scale: 16384.0 | grad norm: 67971.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39315 |
+
time (ms)
|
39316 |
+
iteration 3831/ 159576 | consumed samples: 75344 | elapsed time per iteration (ms): 14600.2 | learning rate: 2.086E-05 | global batch size: 32 | lm loss: 6.450158E+00 | loss scale: 16384.0 | grad norm: 59376.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39317 |
+
time (ms)
|
39318 |
+
iteration 3832/ 159576 | consumed samples: 75376 | elapsed time per iteration (ms): 14931.8 | learning rate: 2.087E-05 | global batch size: 32 | lm loss: 6.537256E+00 | loss scale: 16384.0 | grad norm: 77538.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39319 |
+
time (ms)
|
39320 |
+
iteration 3833/ 159576 | consumed samples: 75408 | elapsed time per iteration (ms): 14592.6 | learning rate: 2.088E-05 | global batch size: 32 | lm loss: 6.392985E+00 | loss scale: 16384.0 | grad norm: 84275.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39321 |
+
time (ms)
|
39322 |
+
iteration 3834/ 159576 | consumed samples: 75440 | elapsed time per iteration (ms): 14616.6 | learning rate: 2.089E-05 | global batch size: 32 | lm loss: 6.512251E+00 | loss scale: 16384.0 | grad norm: 80167.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39323 |
+
time (ms)
|
39324 |
+
iteration 3835/ 159576 | consumed samples: 75472 | elapsed time per iteration (ms): 14584.0 | learning rate: 2.090E-05 | global batch size: 32 | lm loss: 6.467295E+00 | loss scale: 16384.0 | grad norm: 85124.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39325 |
+
time (ms)
|
39326 |
+
iteration 3836/ 159576 | consumed samples: 75504 | elapsed time per iteration (ms): 14844.3 | learning rate: 2.091E-05 | global batch size: 32 | lm loss: 6.514040E+00 | loss scale: 16384.0 | grad norm: 71539.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39327 |
+
time (ms)
|
39328 |
+
iteration 3837/ 159576 | consumed samples: 75536 | elapsed time per iteration (ms): 14618.8 | learning rate: 2.092E-05 | global batch size: 32 | lm loss: 6.519591E+00 | loss scale: 16384.0 | grad norm: 89173.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39329 |
+
time (ms)
|
39330 |
+
iteration 3838/ 159576 | consumed samples: 75568 | elapsed time per iteration (ms): 14566.0 | learning rate: 2.092E-05 | global batch size: 32 | lm loss: 6.447284E+00 | loss scale: 16384.0 | grad norm: 86030.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39331 |
+
time (ms)
|
39332 |
+
iteration 3839/ 159576 | consumed samples: 75600 | elapsed time per iteration (ms): 14636.3 | learning rate: 2.093E-05 | global batch size: 32 | lm loss: 6.369718E+00 | loss scale: 16384.0 | grad norm: 66275.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39333 |
+
time (ms)
|
39334 |
+
iteration 3840/ 159576 | consumed samples: 75632 | elapsed time per iteration (ms): 14897.9 | learning rate: 2.094E-05 | global batch size: 32 | lm loss: 6.467171E+00 | loss scale: 16384.0 | grad norm: 82043.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39335 |
+
time (ms)
|
39336 |
+
iteration 3841/ 159576 | consumed samples: 75664 | elapsed time per iteration (ms): 14554.8 | learning rate: 2.095E-05 | global batch size: 32 | lm loss: 6.458669E+00 | loss scale: 16384.0 | grad norm: 73761.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39337 |
+
time (ms)
|
39338 |
+
iteration 3842/ 159576 | consumed samples: 75696 | elapsed time per iteration (ms): 14564.2 | learning rate: 2.096E-05 | global batch size: 32 | lm loss: 6.516797E+00 | loss scale: 16384.0 | grad norm: 83647.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39339 |
+
time (ms)
|
39340 |
+
iteration 3843/ 159576 | consumed samples: 75728 | elapsed time per iteration (ms): 14464.9 | learning rate: 2.097E-05 | global batch size: 32 | lm loss: 6.381551E+00 | loss scale: 16384.0 | grad norm: 58297.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39341 |
+
time (ms)
|
39342 |
+
iteration 3844/ 159576 | consumed samples: 75760 | elapsed time per iteration (ms): 14942.4 | learning rate: 2.098E-05 | global batch size: 32 | lm loss: 6.471825E+00 | loss scale: 16384.0 | grad norm: 82881.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39343 |
+
time (ms)
|
39344 |
+
iteration 3845/ 159576 | consumed samples: 75792 | elapsed time per iteration (ms): 14531.3 | learning rate: 2.099E-05 | global batch size: 32 | lm loss: 6.528457E+00 | loss scale: 16384.0 | grad norm: 67296.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39345 |
+
time (ms)
|
39346 |
+
iteration 3846/ 159576 | consumed samples: 75824 | elapsed time per iteration (ms): 14601.9 | learning rate: 2.100E-05 | global batch size: 32 | lm loss: 6.408827E+00 | loss scale: 16384.0 | grad norm: 67512.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39347 |
+
time (ms)
|
39348 |
+
iteration 3847/ 159576 | consumed samples: 75856 | elapsed time per iteration (ms): 14580.2 | learning rate: 2.100E-05 | global batch size: 32 | lm loss: 6.440091E+00 | loss scale: 16384.0 | grad norm: 78400.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39349 |
+
time (ms)
|
39350 |
+
iteration 3848/ 159576 | consumed samples: 75888 | elapsed time per iteration (ms): 14911.9 | learning rate: 2.101E-05 | global batch size: 32 | lm loss: 6.374573E+00 | loss scale: 16384.0 | grad norm: 85886.969 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39351 |
+
time (ms)
|
39352 |
+
iteration 3849/ 159576 | consumed samples: 75920 | elapsed time per iteration (ms): 14768.3 | learning rate: 2.102E-05 | global batch size: 32 | lm loss: 6.529835E+00 | loss scale: 16384.0 | grad norm: 71394.057 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39353 |
+
time (ms)
|
39354 |
+
iteration 3850/ 159576 | consumed samples: 75952 | elapsed time per iteration (ms): 14553.3 | learning rate: 2.103E-05 | global batch size: 32 | lm loss: 6.455585E+00 | loss scale: 16384.0 | grad norm: 67772.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39355 |
+
time (ms)
|
39356 |
+
iteration 3851/ 159576 | consumed samples: 75984 | elapsed time per iteration (ms): 14574.9 | learning rate: 2.104E-05 | global batch size: 32 | lm loss: 6.428284E+00 | loss scale: 16384.0 | grad norm: 110864.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39357 |
+
time (ms)
|
39358 |
+
iteration 3852/ 159576 | consumed samples: 76016 | elapsed time per iteration (ms): 14592.6 | learning rate: 2.105E-05 | global batch size: 32 | lm loss: 6.457644E+00 | loss scale: 16384.0 | grad norm: 73499.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39359 |
+
time (ms)
|
39360 |
+
iteration 3853/ 159576 | consumed samples: 76048 | elapsed time per iteration (ms): 14780.7 | learning rate: 2.106E-05 | global batch size: 32 | lm loss: 6.459057E+00 | loss scale: 16384.0 | grad norm: 71503.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39361 |
+
time (ms)
|
39362 |
+
iteration 3854/ 159576 | consumed samples: 76080 | elapsed time per iteration (ms): 14631.9 | learning rate: 2.107E-05 | global batch size: 32 | lm loss: 6.522111E+00 | loss scale: 16384.0 | grad norm: 73205.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39363 |
+
time (ms)
|
39364 |
+
iteration 3855/ 159576 | consumed samples: 76112 | elapsed time per iteration (ms): 14685.7 | learning rate: 2.108E-05 | global batch size: 32 | lm loss: 6.444643E+00 | loss scale: 16384.0 | grad norm: 70169.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39365 |
+
time (ms)
|
39366 |
+
iteration 3856/ 159576 | consumed samples: 76144 | elapsed time per iteration (ms): 14534.2 | learning rate: 2.108E-05 | global batch size: 32 | lm loss: 6.392300E+00 | loss scale: 16384.0 | grad norm: 81224.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39367 |
+
time (ms)
|
39368 |
+
iteration 3857/ 159576 | consumed samples: 76176 | elapsed time per iteration (ms): 14734.9 | learning rate: 2.109E-05 | global batch size: 32 | lm loss: 6.474737E+00 | loss scale: 16384.0 | grad norm: 76429.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39369 |
+
time (ms)
|
39370 |
+
iteration 3858/ 159576 | consumed samples: 76208 | elapsed time per iteration (ms): 14589.1 | learning rate: 2.110E-05 | global batch size: 32 | lm loss: 6.481500E+00 | loss scale: 16384.0 | grad norm: 76288.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39371 |
+
time (ms)
|
39372 |
+
iteration 3859/ 159576 | consumed samples: 76240 | elapsed time per iteration (ms): 14536.6 | learning rate: 2.111E-05 | global batch size: 32 | lm loss: 6.504058E+00 | loss scale: 16384.0 | grad norm: 75104.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39373 |
+
time (ms)
|
39374 |
+
iteration 3860/ 159576 | consumed samples: 76272 | elapsed time per iteration (ms): 14557.4 | learning rate: 2.112E-05 | global batch size: 32 | lm loss: 6.616935E+00 | loss scale: 16384.0 | grad norm: 73471.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39375 |
+
time (ms)
|
39376 |
+
iteration 3861/ 159576 | consumed samples: 76304 | elapsed time per iteration (ms): 14996.3 | learning rate: 2.113E-05 | global batch size: 32 | lm loss: 6.437632E+00 | loss scale: 16384.0 | grad norm: 100626.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39377 |
+
time (ms)
|
39378 |
+
iteration 3862/ 159576 | consumed samples: 76336 | elapsed time per iteration (ms): 14610.8 | learning rate: 2.114E-05 | global batch size: 32 | lm loss: 6.358921E+00 | loss scale: 16384.0 | grad norm: 84367.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39379 |
+
time (ms)
|
39380 |
+
iteration 3863/ 159576 | consumed samples: 76368 | elapsed time per iteration (ms): 14574.0 | learning rate: 2.115E-05 | global batch size: 32 | lm loss: 6.489450E+00 | loss scale: 16384.0 | grad norm: 111308.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39381 |
+
time (ms)
|
39382 |
+
iteration 3864/ 159576 | consumed samples: 76400 | elapsed time per iteration (ms): 14585.8 | learning rate: 2.116E-05 | global batch size: 32 | lm loss: 6.579299E+00 | loss scale: 16384.0 | grad norm: 71685.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39383 |
+
time (ms)
|
39384 |
+
iteration 3865/ 159576 | consumed samples: 76432 | elapsed time per iteration (ms): 14801.5 | learning rate: 2.116E-05 | global batch size: 32 | lm loss: 6.356242E+00 | loss scale: 16384.0 | grad norm: 68636.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39385 |
+
time (ms)
|
39386 |
+
iteration 3866/ 159576 | consumed samples: 76464 | elapsed time per iteration (ms): 14581.8 | learning rate: 2.117E-05 | global batch size: 32 | lm loss: 6.583051E+00 | loss scale: 16384.0 | grad norm: 83498.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39387 |
+
time (ms)
|
39388 |
+
iteration 3867/ 159576 | consumed samples: 76496 | elapsed time per iteration (ms): 14548.1 | learning rate: 2.118E-05 | global batch size: 32 | lm loss: 6.414474E+00 | loss scale: 16384.0 | grad norm: 70120.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39389 |
+
time (ms)
|
39390 |
+
iteration 3868/ 159576 | consumed samples: 76528 | elapsed time per iteration (ms): 14581.2 | learning rate: 2.119E-05 | global batch size: 32 | lm loss: 6.383676E+00 | loss scale: 16384.0 | grad norm: 65625.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39391 |
+
time (ms)
|
39392 |
+
iteration 3869/ 159576 | consumed samples: 76560 | elapsed time per iteration (ms): 14975.0 | learning rate: 2.120E-05 | global batch size: 32 | lm loss: 6.553302E+00 | loss scale: 16384.0 | grad norm: 78443.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39393 |
+
time (ms)
|
39394 |
+
iteration 3870/ 159576 | consumed samples: 76592 | elapsed time per iteration (ms): 14654.1 | learning rate: 2.121E-05 | global batch size: 32 | lm loss: 6.525763E+00 | loss scale: 16384.0 | grad norm: 74575.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39395 |
+
time (ms)
|
39396 |
+
iteration 3871/ 159576 | consumed samples: 76624 | elapsed time per iteration (ms): 14658.5 | learning rate: 2.122E-05 | global batch size: 32 | lm loss: 6.416959E+00 | loss scale: 16384.0 | grad norm: 61001.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39397 |
+
time (ms)
|
39398 |
+
iteration 3872/ 159576 | consumed samples: 76656 | elapsed time per iteration (ms): 14544.3 | learning rate: 2.123E-05 | global batch size: 32 | lm loss: 6.516649E+00 | loss scale: 16384.0 | grad norm: 76582.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39399 |
+
time (ms)
|
39400 |
+
iteration 3873/ 159576 | consumed samples: 76688 | elapsed time per iteration (ms): 14961.2 | learning rate: 2.124E-05 | global batch size: 32 | lm loss: 6.532383E+00 | loss scale: 16384.0 | grad norm: 98540.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39401 |
+
time (ms)
|
39402 |
+
iteration 3874/ 159576 | consumed samples: 76720 | elapsed time per iteration (ms): 14595.7 | learning rate: 2.124E-05 | global batch size: 32 | lm loss: 6.589262E+00 | loss scale: 16384.0 | grad norm: 90020.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39403 |
+
time (ms)
|
39404 |
+
iteration 3875/ 159576 | consumed samples: 76752 | elapsed time per iteration (ms): 14549.8 | learning rate: 2.125E-05 | global batch size: 32 | lm loss: 6.475612E+00 | loss scale: 16384.0 | grad norm: 71253.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39405 |
+
time (ms)
|
39406 |
+
iteration 3876/ 159576 | consumed samples: 76784 | elapsed time per iteration (ms): 14539.7 | learning rate: 2.126E-05 | global batch size: 32 | lm loss: 6.477540E+00 | loss scale: 16384.0 | grad norm: 113904.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39407 |
+
time (ms)
|
39408 |
+
iteration 3877/ 159576 | consumed samples: 76816 | elapsed time per iteration (ms): 14922.4 | learning rate: 2.127E-05 | global batch size: 32 | lm loss: 6.475825E+00 | loss scale: 16384.0 | grad norm: 59736.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39409 |
+
time (ms)
|
39410 |
+
iteration 3878/ 159576 | consumed samples: 76848 | elapsed time per iteration (ms): 14676.0 | learning rate: 2.128E-05 | global batch size: 32 | lm loss: 6.477038E+00 | loss scale: 16384.0 | grad norm: 73926.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39411 |
+
time (ms)
|
39412 |
+
iteration 3879/ 159576 | consumed samples: 76880 | elapsed time per iteration (ms): 14505.4 | learning rate: 2.129E-05 | global batch size: 32 | lm loss: 6.577363E+00 | loss scale: 16384.0 | grad norm: 65273.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39413 |
+
time (ms)
|
39414 |
+
iteration 3880/ 159576 | consumed samples: 76912 | elapsed time per iteration (ms): 14525.2 | learning rate: 2.130E-05 | global batch size: 32 | lm loss: 6.431276E+00 | loss scale: 16384.0 | grad norm: 62353.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39415 |
+
time (ms)
|
39416 |
+
iteration 3881/ 159576 | consumed samples: 76944 | elapsed time per iteration (ms): 14918.9 | learning rate: 2.131E-05 | global batch size: 32 | lm loss: 6.471975E+00 | loss scale: 16384.0 | grad norm: 80402.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39417 |
+
time (ms)
|
39418 |
+
iteration 3882/ 159576 | consumed samples: 76976 | elapsed time per iteration (ms): 14543.5 | learning rate: 2.132E-05 | global batch size: 32 | lm loss: 6.481179E+00 | loss scale: 16384.0 | grad norm: 59241.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39419 |
+
time (ms)
|
39420 |
+
iteration 3883/ 159576 | consumed samples: 77008 | elapsed time per iteration (ms): 14519.1 | learning rate: 2.132E-05 | global batch size: 32 | lm loss: 6.356431E+00 | loss scale: 16384.0 | grad norm: 66124.949 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39421 |
+
time (ms)
|
39422 |
+
iteration 3884/ 159576 | consumed samples: 77040 | elapsed time per iteration (ms): 14635.6 | learning rate: 2.133E-05 | global batch size: 32 | lm loss: 7.171796E+00 | loss scale: 16384.0 | grad norm: 628102.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39423 |
+
time (ms)
|
39424 |
+
iteration 3885/ 159576 | consumed samples: 77072 | elapsed time per iteration (ms): 14877.6 | learning rate: 2.134E-05 | global batch size: 32 | lm loss: 7.122965E+00 | loss scale: 16384.0 | grad norm: 105361.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39425 |
+
time (ms)
|
39426 |
+
iteration 3886/ 159576 | consumed samples: 77104 | elapsed time per iteration (ms): 14581.7 | learning rate: 2.135E-05 | global batch size: 32 | lm loss: 6.781033E+00 | loss scale: 16384.0 | grad norm: 90805.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39427 |
+
time (ms)
|
39428 |
+
iteration 3887/ 159576 | consumed samples: 77136 | elapsed time per iteration (ms): 14580.5 | learning rate: 2.136E-05 | global batch size: 32 | lm loss: 6.824611E+00 | loss scale: 16384.0 | grad norm: 128888.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39429 |
+
time (ms)
|
39430 |
+
iteration 3888/ 159576 | consumed samples: 77168 | elapsed time per iteration (ms): 14468.4 | learning rate: 2.137E-05 | global batch size: 32 | lm loss: 6.773994E+00 | loss scale: 16384.0 | grad norm: 67441.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39431 |
+
time (ms)
|
39432 |
+
iteration 3889/ 159576 | consumed samples: 77200 | elapsed time per iteration (ms): 14934.3 | learning rate: 2.138E-05 | global batch size: 32 | lm loss: 6.845183E+00 | loss scale: 16384.0 | grad norm: 171660.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39433 |
+
time (ms)
|
39434 |
+
iteration 3890/ 159576 | consumed samples: 77232 | elapsed time per iteration (ms): 14531.8 | learning rate: 2.139E-05 | global batch size: 32 | lm loss: 6.803124E+00 | loss scale: 16384.0 | grad norm: 100767.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39435 |
+
time (ms)
|
39436 |
+
iteration 3891/ 159576 | consumed samples: 77264 | elapsed time per iteration (ms): 14568.7 | learning rate: 2.139E-05 | global batch size: 32 | lm loss: 6.825951E+00 | loss scale: 16384.0 | grad norm: 84326.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39437 |
+
time (ms)
|
39438 |
+
iteration 3892/ 159576 | consumed samples: 77296 | elapsed time per iteration (ms): 14543.8 | learning rate: 2.140E-05 | global batch size: 32 | lm loss: 6.734772E+00 | loss scale: 16384.0 | grad norm: 87236.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39439 |
+
time (ms)
|
39440 |
+
iteration 3893/ 159576 | consumed samples: 77328 | elapsed time per iteration (ms): 14607.7 | learning rate: 2.141E-05 | global batch size: 32 | lm loss: 6.789660E+00 | loss scale: 16384.0 | grad norm: 88054.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39441 |
+
time (ms)
|
39442 |
+
iteration 3894/ 159576 | consumed samples: 77360 | elapsed time per iteration (ms): 14920.9 | learning rate: 2.142E-05 | global batch size: 32 | lm loss: 6.710454E+00 | loss scale: 16384.0 | grad norm: 182978.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39443 |
+
time (ms)
|
39444 |
+
iteration 3895/ 159576 | consumed samples: 77392 | elapsed time per iteration (ms): 14510.2 | learning rate: 2.143E-05 | global batch size: 32 | lm loss: 6.691602E+00 | loss scale: 16384.0 | grad norm: 119037.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39445 |
+
time (ms)
|
39446 |
+
iteration 3896/ 159576 | consumed samples: 77424 | elapsed time per iteration (ms): 14496.2 | learning rate: 2.144E-05 | global batch size: 32 | lm loss: 6.739342E+00 | loss scale: 16384.0 | grad norm: 97461.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39447 |
+
time (ms)
|
39448 |
+
iteration 3897/ 159576 | consumed samples: 77456 | elapsed time per iteration (ms): 14526.7 | learning rate: 2.145E-05 | global batch size: 32 | lm loss: 6.818674E+00 | loss scale: 16384.0 | grad norm: 86334.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39449 |
+
time (ms)
|
39450 |
+
iteration 3898/ 159576 | consumed samples: 77488 | elapsed time per iteration (ms): 14792.9 | learning rate: 2.146E-05 | global batch size: 32 | lm loss: 6.717194E+00 | loss scale: 16384.0 | grad norm: 113951.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39451 |
+
time (ms)
|
39452 |
+
iteration 3899/ 159576 | consumed samples: 77520 | elapsed time per iteration (ms): 14491.5 | learning rate: 2.147E-05 | global batch size: 32 | lm loss: 6.714782E+00 | loss scale: 16384.0 | grad norm: 99766.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39453 |
+
time (ms)
|
39454 |
+
iteration 3900/ 159576 | consumed samples: 77552 | elapsed time per iteration (ms): 14584.1 | learning rate: 2.147E-05 | global batch size: 32 | lm loss: 6.659179E+00 | loss scale: 16384.0 | grad norm: 89663.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39455 |
+
time (ms)
|
39456 |
+
iteration 3901/ 159576 | consumed samples: 77584 | elapsed time per iteration (ms): 14629.2 | learning rate: 2.148E-05 | global batch size: 32 | lm loss: 6.615579E+00 | loss scale: 16384.0 | grad norm: 68957.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39457 |
+
time (ms)
|
39458 |
+
iteration 3902/ 159576 | consumed samples: 77616 | elapsed time per iteration (ms): 14617.9 | learning rate: 2.149E-05 | global batch size: 32 | lm loss: 6.606854E+00 | loss scale: 16384.0 | grad norm: 99968.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39459 |
+
time (ms)
|
39460 |
+
iteration 3903/ 159576 | consumed samples: 77648 | elapsed time per iteration (ms): 14554.1 | learning rate: 2.150E-05 | global batch size: 32 | lm loss: 6.537298E+00 | loss scale: 16384.0 | grad norm: 67921.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39461 |
+
time (ms)
|
39462 |
+
iteration 3904/ 159576 | consumed samples: 77680 | elapsed time per iteration (ms): 14545.4 | learning rate: 2.151E-05 | global batch size: 32 | lm loss: 6.606940E+00 | loss scale: 16384.0 | grad norm: 145573.785 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39463 |
+
time (ms)
|
39464 |
+
iteration 3905/ 159576 | consumed samples: 77712 | elapsed time per iteration (ms): 14521.9 | learning rate: 2.152E-05 | global batch size: 32 | lm loss: 6.625298E+00 | loss scale: 16384.0 | grad norm: 96778.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39465 |
+
time (ms)
|
39466 |
+
iteration 3906/ 159576 | consumed samples: 77744 | elapsed time per iteration (ms): 14699.2 | learning rate: 2.153E-05 | global batch size: 32 | lm loss: 6.624491E+00 | loss scale: 16384.0 | grad norm: 92738.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39467 |
+
time (ms)
|
39468 |
+
iteration 3907/ 159576 | consumed samples: 77776 | elapsed time per iteration (ms): 14558.6 | learning rate: 2.154E-05 | global batch size: 32 | lm loss: 6.825802E+00 | loss scale: 16384.0 | grad norm: 119492.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39469 |
+
time (ms)
|
39470 |
+
iteration 3908/ 159576 | consumed samples: 77808 | elapsed time per iteration (ms): 14547.7 | learning rate: 2.155E-05 | global batch size: 32 | lm loss: 6.591653E+00 | loss scale: 16384.0 | grad norm: 78761.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39471 |
+
time (ms)
|
39472 |
+
iteration 3909/ 159576 | consumed samples: 77840 | elapsed time per iteration (ms): 14554.0 | learning rate: 2.155E-05 | global batch size: 32 | lm loss: 6.567001E+00 | loss scale: 16384.0 | grad norm: 147075.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39473 |
+
time (ms)
|
39474 |
+
iteration 3910/ 159576 | consumed samples: 77872 | elapsed time per iteration (ms): 15013.4 | learning rate: 2.156E-05 | global batch size: 32 | lm loss: 6.787440E+00 | loss scale: 16384.0 | grad norm: 142314.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39475 |
+
time (ms)
|
39476 |
+
iteration 3911/ 159576 | consumed samples: 77904 | elapsed time per iteration (ms): 14566.2 | learning rate: 2.157E-05 | global batch size: 32 | lm loss: 6.525432E+00 | loss scale: 16384.0 | grad norm: 87369.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39477 |
+
time (ms)
|
39478 |
+
iteration 3912/ 159576 | consumed samples: 77936 | elapsed time per iteration (ms): 14516.0 | learning rate: 2.158E-05 | global batch size: 32 | lm loss: 6.615817E+00 | loss scale: 16384.0 | grad norm: 83904.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39479 |
+
time (ms)
|
39480 |
+
iteration 3913/ 159576 | consumed samples: 77968 | elapsed time per iteration (ms): 14525.8 | learning rate: 2.159E-05 | global batch size: 32 | lm loss: 6.564670E+00 | loss scale: 16384.0 | grad norm: 97516.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39481 |
+
time (ms)
|
39482 |
+
iteration 3914/ 159576 | consumed samples: 78000 | elapsed time per iteration (ms): 15027.0 | learning rate: 2.160E-05 | global batch size: 32 | lm loss: 6.400544E+00 | loss scale: 16384.0 | grad norm: 92743.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39483 |
+
time (ms)
|
39484 |
+
iteration 3915/ 159576 | consumed samples: 78032 | elapsed time per iteration (ms): 14573.6 | learning rate: 2.161E-05 | global batch size: 32 | lm loss: 6.603245E+00 | loss scale: 16384.0 | grad norm: 106541.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39485 |
+
time (ms)
|
39486 |
+
iteration 3916/ 159576 | consumed samples: 78064 | elapsed time per iteration (ms): 14538.9 | learning rate: 2.162E-05 | global batch size: 32 | lm loss: 6.560642E+00 | loss scale: 16384.0 | grad norm: 71313.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39487 |
+
time (ms)
|
39488 |
+
iteration 3917/ 159576 | consumed samples: 78096 | elapsed time per iteration (ms): 14550.2 | learning rate: 2.163E-05 | global batch size: 32 | lm loss: 6.578140E+00 | loss scale: 16384.0 | grad norm: 83812.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39489 |
+
time (ms)
|
39490 |
+
iteration 3918/ 159576 | consumed samples: 78128 | elapsed time per iteration (ms): 14857.6 | learning rate: 2.163E-05 | global batch size: 32 | lm loss: 6.583351E+00 | loss scale: 16384.0 | grad norm: 69616.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39491 |
+
time (ms)
|
39492 |
+
iteration 3919/ 159576 | consumed samples: 78160 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.164E-05 | global batch size: 32 | lm loss: 6.595952E+00 | loss scale: 16384.0 | grad norm: 83133.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39493 |
+
time (ms)
|
39494 |
+
iteration 3920/ 159576 | consumed samples: 78192 | elapsed time per iteration (ms): 14502.7 | learning rate: 2.165E-05 | global batch size: 32 | lm loss: 6.645111E+00 | loss scale: 16384.0 | grad norm: 69570.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39495 |
+
time (ms)
|
39496 |
+
iteration 3921/ 159576 | consumed samples: 78224 | elapsed time per iteration (ms): 14498.8 | learning rate: 2.166E-05 | global batch size: 32 | lm loss: 6.553501E+00 | loss scale: 16384.0 | grad norm: 142896.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39497 |
+
time (ms)
|
39498 |
+
iteration 3922/ 159576 | consumed samples: 78256 | elapsed time per iteration (ms): 14842.1 | learning rate: 2.167E-05 | global batch size: 32 | lm loss: 6.687614E+00 | loss scale: 16384.0 | grad norm: 107346.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39499 |
+
time (ms)
|
39500 |
+
iteration 3923/ 159576 | consumed samples: 78288 | elapsed time per iteration (ms): 14567.6 | learning rate: 2.168E-05 | global batch size: 32 | lm loss: 6.764112E+00 | loss scale: 16384.0 | grad norm: 75484.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39501 |
+
time (ms)
|
39502 |
+
iteration 3924/ 159576 | consumed samples: 78320 | elapsed time per iteration (ms): 14603.6 | learning rate: 2.169E-05 | global batch size: 32 | lm loss: 6.384696E+00 | loss scale: 16384.0 | grad norm: 91570.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39503 |
+
time (ms)
|
39504 |
+
iteration 3925/ 159576 | consumed samples: 78352 | elapsed time per iteration (ms): 14494.1 | learning rate: 2.170E-05 | global batch size: 32 | lm loss: 6.148740E+00 | loss scale: 16384.0 | grad norm: 66094.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39505 |
+
time (ms)
|
39506 |
+
iteration 3926/ 159576 | consumed samples: 78384 | elapsed time per iteration (ms): 14880.0 | learning rate: 2.171E-05 | global batch size: 32 | lm loss: 6.492467E+00 | loss scale: 16384.0 | grad norm: 95980.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39507 |
+
time (ms)
|
39508 |
+
iteration 3927/ 159576 | consumed samples: 78416 | elapsed time per iteration (ms): 14529.0 | learning rate: 2.171E-05 | global batch size: 32 | lm loss: 6.634668E+00 | loss scale: 16384.0 | grad norm: 102240.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39509 |
+
time (ms)
|
39510 |
+
iteration 3928/ 159576 | consumed samples: 78448 | elapsed time per iteration (ms): 14524.9 | learning rate: 2.172E-05 | global batch size: 32 | lm loss: 6.542571E+00 | loss scale: 16384.0 | grad norm: 78190.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39511 |
+
time (ms)
|
39512 |
+
iteration 3929/ 159576 | consumed samples: 78480 | elapsed time per iteration (ms): 14519.9 | learning rate: 2.173E-05 | global batch size: 32 | lm loss: 6.546354E+00 | loss scale: 16384.0 | grad norm: 69181.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39513 |
+
time (ms)
|
39514 |
+
iteration 3930/ 159576 | consumed samples: 78512 | elapsed time per iteration (ms): 14848.7 | learning rate: 2.174E-05 | global batch size: 32 | lm loss: 6.556016E+00 | loss scale: 16384.0 | grad norm: 166890.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39515 |
+
time (ms)
|
39516 |
+
iteration 3931/ 159576 | consumed samples: 78544 | elapsed time per iteration (ms): 14630.3 | learning rate: 2.175E-05 | global batch size: 32 | lm loss: 6.575625E+00 | loss scale: 16384.0 | grad norm: 67026.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39517 |
+
time (ms)
|
39518 |
+
iteration 3932/ 159576 | consumed samples: 78576 | elapsed time per iteration (ms): 14503.2 | learning rate: 2.176E-05 | global batch size: 32 | lm loss: 6.528583E+00 | loss scale: 16384.0 | grad norm: 65300.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39519 |
+
time (ms)
|
39520 |
+
iteration 3933/ 159576 | consumed samples: 78608 | elapsed time per iteration (ms): 14533.6 | learning rate: 2.177E-05 | global batch size: 32 | lm loss: 6.571996E+00 | loss scale: 16384.0 | grad norm: 61530.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39521 |
+
time (ms)
|
39522 |
+
iteration 3934/ 159576 | consumed samples: 78640 | elapsed time per iteration (ms): 14528.2 | learning rate: 2.178E-05 | global batch size: 32 | lm loss: 6.524823E+00 | loss scale: 16384.0 | grad norm: 58107.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39523 |
+
time (ms)
|
39524 |
+
iteration 3935/ 159576 | consumed samples: 78672 | elapsed time per iteration (ms): 14801.4 | learning rate: 2.179E-05 | global batch size: 32 | lm loss: 6.627916E+00 | loss scale: 16384.0 | grad norm: 64798.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39525 |
+
time (ms)
|
39526 |
+
iteration 3936/ 159576 | consumed samples: 78704 | elapsed time per iteration (ms): 14509.3 | learning rate: 2.179E-05 | global batch size: 32 | lm loss: 6.511620E+00 | loss scale: 16384.0 | grad norm: 59258.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39527 |
+
time (ms)
|
39528 |
+
iteration 3937/ 159576 | consumed samples: 78736 | elapsed time per iteration (ms): 14529.7 | learning rate: 2.180E-05 | global batch size: 32 | lm loss: 6.414696E+00 | loss scale: 16384.0 | grad norm: 75598.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39529 |
+
time (ms)
|
39530 |
+
iteration 3938/ 159576 | consumed samples: 78768 | elapsed time per iteration (ms): 14568.6 | learning rate: 2.181E-05 | global batch size: 32 | lm loss: 6.692476E+00 | loss scale: 16384.0 | grad norm: 68594.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39531 |
+
time (ms)
|
39532 |
+
iteration 3939/ 159576 | consumed samples: 78800 | elapsed time per iteration (ms): 14680.0 | learning rate: 2.182E-05 | global batch size: 32 | lm loss: 6.509182E+00 | loss scale: 16384.0 | grad norm: 77431.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39533 |
+
time (ms)
|
39534 |
+
iteration 3940/ 159576 | consumed samples: 78832 | elapsed time per iteration (ms): 14561.3 | learning rate: 2.183E-05 | global batch size: 32 | lm loss: 6.521114E+00 | loss scale: 16384.0 | grad norm: 67107.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39535 |
+
time (ms)
|
39536 |
+
iteration 3941/ 159576 | consumed samples: 78864 | elapsed time per iteration (ms): 14540.3 | learning rate: 2.184E-05 | global batch size: 32 | lm loss: 6.557777E+00 | loss scale: 16384.0 | grad norm: 82252.980 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39537 |
+
time (ms)
|
39538 |
+
iteration 3942/ 159576 | consumed samples: 78896 | elapsed time per iteration (ms): 14516.4 | learning rate: 2.185E-05 | global batch size: 32 | lm loss: 6.519272E+00 | loss scale: 16384.0 | grad norm: 62956.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39539 |
+
time (ms)
|
39540 |
+
iteration 3943/ 159576 | consumed samples: 78928 | elapsed time per iteration (ms): 14804.0 | learning rate: 2.186E-05 | global batch size: 32 | lm loss: 6.436077E+00 | loss scale: 16384.0 | grad norm: 63372.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39541 |
+
time (ms)
|
39542 |
+
iteration 3944/ 159576 | consumed samples: 78960 | elapsed time per iteration (ms): 14504.5 | learning rate: 2.187E-05 | global batch size: 32 | lm loss: 6.536609E+00 | loss scale: 16384.0 | grad norm: 70623.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39543 |
+
time (ms)
|
39544 |
+
iteration 3945/ 159576 | consumed samples: 78992 | elapsed time per iteration (ms): 14519.8 | learning rate: 2.187E-05 | global batch size: 32 | lm loss: 6.631818E+00 | loss scale: 16384.0 | grad norm: 62267.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39545 |
+
time (ms)
|
39546 |
+
iteration 3946/ 159576 | consumed samples: 79024 | elapsed time per iteration (ms): 14592.1 | learning rate: 2.188E-05 | global batch size: 32 | lm loss: 6.263665E+00 | loss scale: 16384.0 | grad norm: 67107.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39547 |
+
time (ms)
|
39548 |
+
iteration 3947/ 159576 | consumed samples: 79056 | elapsed time per iteration (ms): 14791.6 | learning rate: 2.189E-05 | global batch size: 32 | lm loss: 6.622372E+00 | loss scale: 16384.0 | grad norm: 84764.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39549 |
+
time (ms)
|
39550 |
+
iteration 3948/ 159576 | consumed samples: 79088 | elapsed time per iteration (ms): 14637.3 | learning rate: 2.190E-05 | global batch size: 32 | lm loss: 6.395759E+00 | loss scale: 16384.0 | grad norm: 60113.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39551 |
+
time (ms)
|
39552 |
+
iteration 3949/ 159576 | consumed samples: 79120 | elapsed time per iteration (ms): 14546.6 | learning rate: 2.191E-05 | global batch size: 32 | lm loss: 6.588756E+00 | loss scale: 16384.0 | grad norm: 68679.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39553 |
+
time (ms)
|
39554 |
+
iteration 3950/ 159576 | consumed samples: 79152 | elapsed time per iteration (ms): 14514.6 | learning rate: 2.192E-05 | global batch size: 32 | lm loss: 6.484011E+00 | loss scale: 16384.0 | grad norm: 68729.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39555 |
+
time (ms)
|
39556 |
+
iteration 3951/ 159576 | consumed samples: 79184 | elapsed time per iteration (ms): 14907.8 | learning rate: 2.193E-05 | global batch size: 32 | lm loss: 6.496289E+00 | loss scale: 16384.0 | grad norm: 58918.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39557 |
+
time (ms)
|
39558 |
+
iteration 3952/ 159576 | consumed samples: 79216 | elapsed time per iteration (ms): 14467.7 | learning rate: 2.194E-05 | global batch size: 32 | lm loss: 6.442475E+00 | loss scale: 16384.0 | grad norm: 73240.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39559 |
+
time (ms)
|
39560 |
+
iteration 3953/ 159576 | consumed samples: 79248 | elapsed time per iteration (ms): 14613.3 | learning rate: 2.195E-05 | global batch size: 32 | lm loss: 6.412640E+00 | loss scale: 16384.0 | grad norm: 63495.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39561 |
+
time (ms)
|
39562 |
+
iteration 3954/ 159576 | consumed samples: 79280 | elapsed time per iteration (ms): 14497.1 | learning rate: 2.195E-05 | global batch size: 32 | lm loss: 6.419092E+00 | loss scale: 16384.0 | grad norm: 64832.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39563 |
+
time (ms)
|
39564 |
+
iteration 3955/ 159576 | consumed samples: 79312 | elapsed time per iteration (ms): 14864.8 | learning rate: 2.196E-05 | global batch size: 32 | lm loss: 6.411493E+00 | loss scale: 16384.0 | grad norm: 70227.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39565 |
+
time (ms)
|
39566 |
+
iteration 3956/ 159576 | consumed samples: 79344 | elapsed time per iteration (ms): 14501.1 | learning rate: 2.197E-05 | global batch size: 32 | lm loss: 6.377773E+00 | loss scale: 16384.0 | grad norm: 65521.131 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39567 |
+
time (ms)
|
39568 |
+
iteration 3957/ 159576 | consumed samples: 79376 | elapsed time per iteration (ms): 14522.7 | learning rate: 2.198E-05 | global batch size: 32 | lm loss: 6.458980E+00 | loss scale: 16384.0 | grad norm: 62294.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39569 |
+
time (ms)
|
39570 |
+
iteration 3958/ 159576 | consumed samples: 79408 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.199E-05 | global batch size: 32 | lm loss: 6.540348E+00 | loss scale: 16384.0 | grad norm: 64994.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39571 |
+
time (ms)
|
39572 |
+
iteration 3959/ 159576 | consumed samples: 79440 | elapsed time per iteration (ms): 14868.7 | learning rate: 2.200E-05 | global batch size: 32 | lm loss: 6.503858E+00 | loss scale: 16384.0 | grad norm: 54271.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39573 |
+
time (ms)
|
39574 |
+
iteration 3960/ 159576 | consumed samples: 79472 | elapsed time per iteration (ms): 14512.5 | learning rate: 2.201E-05 | global batch size: 32 | lm loss: 6.372645E+00 | loss scale: 16384.0 | grad norm: 73237.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39575 |
+
time (ms)
|
39576 |
+
iteration 3961/ 159576 | consumed samples: 79504 | elapsed time per iteration (ms): 14552.3 | learning rate: 2.202E-05 | global batch size: 32 | lm loss: 6.396554E+00 | loss scale: 16384.0 | grad norm: 64579.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39577 |
+
time (ms)
|
39578 |
+
iteration 3962/ 159576 | consumed samples: 79536 | elapsed time per iteration (ms): 14559.3 | learning rate: 2.203E-05 | global batch size: 32 | lm loss: 6.556979E+00 | loss scale: 16384.0 | grad norm: 83489.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39579 |
+
time (ms)
|
39580 |
+
iteration 3963/ 159576 | consumed samples: 79568 | elapsed time per iteration (ms): 14899.9 | learning rate: 2.203E-05 | global batch size: 32 | lm loss: 6.458327E+00 | loss scale: 16384.0 | grad norm: 58716.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39581 |
+
time (ms)
|
39582 |
+
iteration 3964/ 159576 | consumed samples: 79600 | elapsed time per iteration (ms): 14539.5 | learning rate: 2.204E-05 | global batch size: 32 | lm loss: 6.802517E+00 | loss scale: 16384.0 | grad norm: 60731.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39583 |
+
time (ms)
|
39584 |
+
iteration 3965/ 159576 | consumed samples: 79632 | elapsed time per iteration (ms): 14520.1 | learning rate: 2.205E-05 | global batch size: 32 | lm loss: 6.616902E+00 | loss scale: 16384.0 | grad norm: 64155.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39585 |
+
time (ms)
|
39586 |
+
iteration 3966/ 159576 | consumed samples: 79664 | elapsed time per iteration (ms): 14585.2 | learning rate: 2.206E-05 | global batch size: 32 | lm loss: 6.457995E+00 | loss scale: 16384.0 | grad norm: 74880.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39587 |
+
time (ms)
|
39588 |
+
iteration 3967/ 159576 | consumed samples: 79696 | elapsed time per iteration (ms): 14850.0 | learning rate: 2.207E-05 | global batch size: 32 | lm loss: 6.591904E+00 | loss scale: 16384.0 | grad norm: 75336.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39589 |
+
time (ms)
|
39590 |
+
iteration 3968/ 159576 | consumed samples: 79728 | elapsed time per iteration (ms): 14661.7 | learning rate: 2.208E-05 | global batch size: 32 | lm loss: 6.475752E+00 | loss scale: 16384.0 | grad norm: 76852.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39591 |
+
time (ms)
|
39592 |
+
iteration 3969/ 159576 | consumed samples: 79760 | elapsed time per iteration (ms): 14523.7 | learning rate: 2.209E-05 | global batch size: 32 | lm loss: 6.452621E+00 | loss scale: 16384.0 | grad norm: 65844.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39593 |
+
time (ms)
|
39594 |
+
iteration 3970/ 159576 | consumed samples: 79792 | elapsed time per iteration (ms): 14549.1 | learning rate: 2.210E-05 | global batch size: 32 | lm loss: 6.401618E+00 | loss scale: 16384.0 | grad norm: 84954.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39595 |
+
time (ms)
|
39596 |
+
iteration 3971/ 159576 | consumed samples: 79824 | elapsed time per iteration (ms): 14508.8 | learning rate: 2.211E-05 | global batch size: 32 | lm loss: 6.516178E+00 | loss scale: 16384.0 | grad norm: 71111.037 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39597 |
+
time (ms)
|
39598 |
+
iteration 3972/ 159576 | consumed samples: 79856 | elapsed time per iteration (ms): 14847.5 | learning rate: 2.211E-05 | global batch size: 32 | lm loss: 6.601567E+00 | loss scale: 16384.0 | grad norm: 74563.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39599 |
+
time (ms)
|
39600 |
+
iteration 3973/ 159576 | consumed samples: 79888 | elapsed time per iteration (ms): 14594.0 | learning rate: 2.212E-05 | global batch size: 32 | lm loss: 6.441951E+00 | loss scale: 16384.0 | grad norm: 72653.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39601 |
+
time (ms)
|
39602 |
+
iteration 3974/ 159576 | consumed samples: 79920 | elapsed time per iteration (ms): 14478.4 | learning rate: 2.213E-05 | global batch size: 32 | lm loss: 6.510294E+00 | loss scale: 16384.0 | grad norm: 65083.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39603 |
+
time (ms)
|
39604 |
+
iteration 3975/ 159576 | consumed samples: 79952 | elapsed time per iteration (ms): 14520.1 | learning rate: 2.214E-05 | global batch size: 32 | lm loss: 6.345959E+00 | loss scale: 16384.0 | grad norm: 133600.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39605 |
+
time (ms)
|
39606 |
+
iteration 3976/ 159576 | consumed samples: 79984 | elapsed time per iteration (ms): 14770.3 | learning rate: 2.215E-05 | global batch size: 32 | lm loss: 6.477483E+00 | loss scale: 16384.0 | grad norm: 89443.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39607 |
+
time (ms)
|
39608 |
+
iteration 3977/ 159576 | consumed samples: 80016 | elapsed time per iteration (ms): 14483.7 | learning rate: 2.216E-05 | global batch size: 32 | lm loss: 6.466526E+00 | loss scale: 16384.0 | grad norm: 79203.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39609 |
+
time (ms)
|
39610 |
+
iteration 3978/ 159576 | consumed samples: 80048 | elapsed time per iteration (ms): 14548.9 | learning rate: 2.217E-05 | global batch size: 32 | lm loss: 6.490917E+00 | loss scale: 16384.0 | grad norm: 85035.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39611 |
+
time (ms)
|
39612 |
+
iteration 3979/ 159576 | consumed samples: 80080 | elapsed time per iteration (ms): 14519.8 | learning rate: 2.218E-05 | global batch size: 32 | lm loss: 6.412145E+00 | loss scale: 16384.0 | grad norm: 93580.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39613 |
+
time (ms)
|
39614 |
+
iteration 3980/ 159576 | consumed samples: 80112 | elapsed time per iteration (ms): 14659.7 | learning rate: 2.218E-05 | global batch size: 32 | lm loss: 6.473646E+00 | loss scale: 16384.0 | grad norm: 79422.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39615 |
+
time (ms)
|
39616 |
+
iteration 3981/ 159576 | consumed samples: 80144 | elapsed time per iteration (ms): 14525.1 | learning rate: 2.219E-05 | global batch size: 32 | lm loss: 6.522334E+00 | loss scale: 16384.0 | grad norm: 83533.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39617 |
+
time (ms)
|
39618 |
+
iteration 3982/ 159576 | consumed samples: 80176 | elapsed time per iteration (ms): 14543.1 | learning rate: 2.220E-05 | global batch size: 32 | lm loss: 6.387228E+00 | loss scale: 16384.0 | grad norm: 89795.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39619 |
+
time (ms)
|
39620 |
+
iteration 3983/ 159576 | consumed samples: 80208 | elapsed time per iteration (ms): 14609.8 | learning rate: 2.221E-05 | global batch size: 32 | lm loss: 6.475267E+00 | loss scale: 16384.0 | grad norm: 119598.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39621 |
+
time (ms)
|
39622 |
+
iteration 3984/ 159576 | consumed samples: 80240 | elapsed time per iteration (ms): 14596.2 | learning rate: 2.222E-05 | global batch size: 32 | lm loss: 6.533351E+00 | loss scale: 16384.0 | grad norm: 72306.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39623 |
+
time (ms)
|
39624 |
+
iteration 3985/ 159576 | consumed samples: 80272 | elapsed time per iteration (ms): 14621.5 | learning rate: 2.223E-05 | global batch size: 32 | lm loss: 6.540237E+00 | loss scale: 16384.0 | grad norm: 88358.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39625 |
+
time (ms)
|
39626 |
+
iteration 3986/ 159576 | consumed samples: 80304 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.224E-05 | global batch size: 32 | lm loss: 6.419699E+00 | loss scale: 16384.0 | grad norm: 75411.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39627 |
+
time (ms)
|
39628 |
+
iteration 3987/ 159576 | consumed samples: 80336 | elapsed time per iteration (ms): 14555.9 | learning rate: 2.225E-05 | global batch size: 32 | lm loss: 6.591748E+00 | loss scale: 16384.0 | grad norm: 112139.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39629 |
+
time (ms)
|
39630 |
+
iteration 3988/ 159576 | consumed samples: 80368 | elapsed time per iteration (ms): 15004.4 | learning rate: 2.226E-05 | global batch size: 32 | lm loss: 6.551664E+00 | loss scale: 16384.0 | grad norm: 88397.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39631 |
+
time (ms)
|
39632 |
+
iteration 3989/ 159576 | consumed samples: 80400 | elapsed time per iteration (ms): 14610.9 | learning rate: 2.226E-05 | global batch size: 32 | lm loss: 6.531049E+00 | loss scale: 16384.0 | grad norm: 63924.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39633 |
+
time (ms)
|
39634 |
+
iteration 3990/ 159576 | consumed samples: 80432 | elapsed time per iteration (ms): 14532.5 | learning rate: 2.227E-05 | global batch size: 32 | lm loss: 6.546918E+00 | loss scale: 16384.0 | grad norm: 97299.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39635 |
+
time (ms)
|
39636 |
+
iteration 3991/ 159576 | consumed samples: 80464 | elapsed time per iteration (ms): 14437.4 | learning rate: 2.228E-05 | global batch size: 32 | lm loss: 6.471569E+00 | loss scale: 16384.0 | grad norm: 76326.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39637 |
+
time (ms)
|
39638 |
+
iteration 3992/ 159576 | consumed samples: 80496 | elapsed time per iteration (ms): 14906.8 | learning rate: 2.229E-05 | global batch size: 32 | lm loss: 6.525407E+00 | loss scale: 16384.0 | grad norm: 77183.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39639 |
+
time (ms)
|
39640 |
+
iteration 3993/ 159576 | consumed samples: 80528 | elapsed time per iteration (ms): 14534.2 | learning rate: 2.230E-05 | global batch size: 32 | lm loss: 6.539597E+00 | loss scale: 16384.0 | grad norm: 60376.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39641 |
+
time (ms)
|
39642 |
+
iteration 3994/ 159576 | consumed samples: 80560 | elapsed time per iteration (ms): 14579.3 | learning rate: 2.231E-05 | global batch size: 32 | lm loss: 6.552666E+00 | loss scale: 16384.0 | grad norm: 84746.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39643 |
+
time (ms)
|
39644 |
+
iteration 3995/ 159576 | consumed samples: 80592 | elapsed time per iteration (ms): 14529.3 | learning rate: 2.232E-05 | global batch size: 32 | lm loss: 6.413946E+00 | loss scale: 16384.0 | grad norm: 67969.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39645 |
+
time (ms)
|
39646 |
+
iteration 3996/ 159576 | consumed samples: 80624 | elapsed time per iteration (ms): 14922.8 | learning rate: 2.233E-05 | global batch size: 32 | lm loss: 6.427969E+00 | loss scale: 16384.0 | grad norm: 69649.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39647 |
+
time (ms)
|
39648 |
+
iteration 3997/ 159576 | consumed samples: 80656 | elapsed time per iteration (ms): 14490.2 | learning rate: 2.234E-05 | global batch size: 32 | lm loss: 6.527985E+00 | loss scale: 16384.0 | grad norm: 71644.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39649 |
+
time (ms)
|
39650 |
+
iteration 3998/ 159576 | consumed samples: 80688 | elapsed time per iteration (ms): 14523.9 | learning rate: 2.234E-05 | global batch size: 32 | lm loss: 6.565372E+00 | loss scale: 16384.0 | grad norm: 75823.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39651 |
+
time (ms)
|
39652 |
+
iteration 3999/ 159576 | consumed samples: 80720 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.235E-05 | global batch size: 32 | lm loss: 6.377391E+00 | loss scale: 16384.0 | grad norm: 85967.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39653 |
+
time (ms)
|
39654 |
+
[2021-09-24 17:49:17,238] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=6, lr=[2.2362426035502958e-05, 2.2362426035502958e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
|
39655 |
+
steps: 4000 loss: 6.3572 iter time (s): 0.007 samples/sec: 4556.985
|
39656 |
+
iteration 4000/ 159576 | consumed samples: 80752 | elapsed time per iteration (ms): 14858.7 | learning rate: 2.236E-05 | global batch size: 32 | lm loss: 6.357164E+00 | loss scale: 16384.0 | grad norm: 56878.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39657 |
+
time (ms)
|
39658 |
+
------------------------------------------------------------------------------------------------
|
39659 |
+
validation loss at iteration 4000 | lm loss value: 6.453290E+00 | lm loss PPL: 6.347876E+02 |
|
39660 |
+
------------------------------------------------------------------------------------------------
|
39661 |
+
iteration 4001/ 159576 | consumed samples: 80784 | elapsed time per iteration (ms): 20796.3 | learning rate: 2.237E-05 | global batch size: 32 | lm loss: 6.357805E+00 | loss scale: 16384.0 | grad norm: 75271.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39662 |
+
time (ms)
|
39663 |
+
iteration 4002/ 159576 | consumed samples: 80816 | elapsed time per iteration (ms): 14528.3 | learning rate: 2.238E-05 | global batch size: 32 | lm loss: 6.590372E+00 | loss scale: 16384.0 | grad norm: 82823.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39664 |
+
time (ms)
|
39665 |
+
iteration 4003/ 159576 | consumed samples: 80848 | elapsed time per iteration (ms): 14569.0 | learning rate: 2.239E-05 | global batch size: 32 | lm loss: 6.547601E+00 | loss scale: 16384.0 | grad norm: 63495.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39666 |
+
time (ms)
|
39667 |
+
iteration 4004/ 159576 | consumed samples: 80880 | elapsed time per iteration (ms): 14981.7 | learning rate: 2.240E-05 | global batch size: 32 | lm loss: 6.488581E+00 | loss scale: 16384.0 | grad norm: 84538.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39668 |
+
time (ms)
|
39669 |
+
iteration 4005/ 159576 | consumed samples: 80912 | elapsed time per iteration (ms): 14517.6 | learning rate: 2.241E-05 | global batch size: 32 | lm loss: 6.473035E+00 | loss scale: 16384.0 | grad norm: 69154.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39670 |
+
time (ms)
|
39671 |
+
iteration 4006/ 159576 | consumed samples: 80944 | elapsed time per iteration (ms): 14515.3 | learning rate: 2.242E-05 | global batch size: 32 | lm loss: 6.574604E+00 | loss scale: 16384.0 | grad norm: 71258.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39672 |
+
time (ms)
|
39673 |
+
iteration 4007/ 159576 | consumed samples: 80976 | elapsed time per iteration (ms): 14530.3 | learning rate: 2.242E-05 | global batch size: 32 | lm loss: 6.480978E+00 | loss scale: 16384.0 | grad norm: 63598.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39674 |
+
time (ms)
|
39675 |
+
iteration 4008/ 159576 | consumed samples: 81008 | elapsed time per iteration (ms): 15052.4 | learning rate: 2.243E-05 | global batch size: 32 | lm loss: 6.393389E+00 | loss scale: 16384.0 | grad norm: 76474.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39676 |
+
time (ms)
|
39677 |
+
iteration 4009/ 159576 | consumed samples: 81040 | elapsed time per iteration (ms): 14618.9 | learning rate: 2.244E-05 | global batch size: 32 | lm loss: 6.322450E+00 | loss scale: 16384.0 | grad norm: 62736.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39678 |
+
time (ms)
|
39679 |
+
iteration 4010/ 159576 | consumed samples: 81072 | elapsed time per iteration (ms): 14521.7 | learning rate: 2.245E-05 | global batch size: 32 | lm loss: 6.502364E+00 | loss scale: 16384.0 | grad norm: 78751.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39680 |
+
time (ms)
|
39681 |
+
iteration 4011/ 159576 | consumed samples: 81104 | elapsed time per iteration (ms): 14513.4 | learning rate: 2.246E-05 | global batch size: 32 | lm loss: 6.504915E+00 | loss scale: 16384.0 | grad norm: 73290.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39682 |
+
time (ms)
|
39683 |
+
iteration 4012/ 159576 | consumed samples: 81136 | elapsed time per iteration (ms): 14859.5 | learning rate: 2.247E-05 | global batch size: 32 | lm loss: 6.422670E+00 | loss scale: 16384.0 | grad norm: 70911.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39684 |
+
time (ms)
|
39685 |
+
iteration 4013/ 159576 | consumed samples: 81168 | elapsed time per iteration (ms): 14562.7 | learning rate: 2.248E-05 | global batch size: 32 | lm loss: 6.460926E+00 | loss scale: 16384.0 | grad norm: 88361.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39686 |
+
time (ms)
|
39687 |
+
iteration 4014/ 159576 | consumed samples: 81200 | elapsed time per iteration (ms): 14537.6 | learning rate: 2.249E-05 | global batch size: 32 | lm loss: 6.359708E+00 | loss scale: 16384.0 | grad norm: 70950.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39688 |
+
time (ms)
|
39689 |
+
iteration 4015/ 159576 | consumed samples: 81232 | elapsed time per iteration (ms): 14575.5 | learning rate: 2.250E-05 | global batch size: 32 | lm loss: 6.479752E+00 | loss scale: 16384.0 | grad norm: 60916.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39690 |
+
time (ms)
|
39691 |
+
iteration 4016/ 159576 | consumed samples: 81264 | elapsed time per iteration (ms): 14890.4 | learning rate: 2.250E-05 | global batch size: 32 | lm loss: 6.438080E+00 | loss scale: 16384.0 | grad norm: 78503.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39692 |
+
time (ms)
|
39693 |
+
iteration 4017/ 159576 | consumed samples: 81296 | elapsed time per iteration (ms): 14519.4 | learning rate: 2.251E-05 | global batch size: 32 | lm loss: 6.446492E+00 | loss scale: 16384.0 | grad norm: 66299.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39694 |
+
time (ms)
|
39695 |
+
iteration 4018/ 159576 | consumed samples: 81328 | elapsed time per iteration (ms): 14512.9 | learning rate: 2.252E-05 | global batch size: 32 | lm loss: 6.418320E+00 | loss scale: 16384.0 | grad norm: 65936.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39696 |
+
time (ms)
|
39697 |
+
iteration 4019/ 159576 | consumed samples: 81360 | elapsed time per iteration (ms): 14568.1 | learning rate: 2.253E-05 | global batch size: 32 | lm loss: 6.337445E+00 | loss scale: 16384.0 | grad norm: 71727.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39698 |
+
time (ms)
|
39699 |
+
iteration 4020/ 159576 | consumed samples: 81392 | elapsed time per iteration (ms): 14867.3 | learning rate: 2.254E-05 | global batch size: 32 | lm loss: 6.564549E+00 | loss scale: 16384.0 | grad norm: 96122.107 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39700 |
+
time (ms)
|
39701 |
+
iteration 4021/ 159576 | consumed samples: 81424 | elapsed time per iteration (ms): 14435.4 | learning rate: 2.255E-05 | global batch size: 32 | lm loss: 6.485852E+00 | loss scale: 16384.0 | grad norm: 82597.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39702 |
+
time (ms)
|
39703 |
+
iteration 4022/ 159576 | consumed samples: 81456 | elapsed time per iteration (ms): 14558.0 | learning rate: 2.256E-05 | global batch size: 32 | lm loss: 6.539099E+00 | loss scale: 16384.0 | grad norm: 121006.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39704 |
+
time (ms)
|
39705 |
+
iteration 4023/ 159576 | consumed samples: 81488 | elapsed time per iteration (ms): 14530.8 | learning rate: 2.257E-05 | global batch size: 32 | lm loss: 6.588836E+00 | loss scale: 16384.0 | grad norm: 83990.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39706 |
+
time (ms)
|
39707 |
+
iteration 4024/ 159576 | consumed samples: 81520 | elapsed time per iteration (ms): 14903.1 | learning rate: 2.258E-05 | global batch size: 32 | lm loss: 6.478038E+00 | loss scale: 16384.0 | grad norm: 86310.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39708 |
+
time (ms)
|
39709 |
+
iteration 4025/ 159576 | consumed samples: 81552 | elapsed time per iteration (ms): 14640.8 | learning rate: 2.258E-05 | global batch size: 32 | lm loss: 6.423618E+00 | loss scale: 16384.0 | grad norm: 72646.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39710 |
+
time (ms)
|
39711 |
+
iteration 4026/ 159576 | consumed samples: 81584 | elapsed time per iteration (ms): 14523.1 | learning rate: 2.259E-05 | global batch size: 32 | lm loss: 6.389876E+00 | loss scale: 16384.0 | grad norm: 75260.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39712 |
+
time (ms)
|
39713 |
+
iteration 4027/ 159576 | consumed samples: 81616 | elapsed time per iteration (ms): 14495.3 | learning rate: 2.260E-05 | global batch size: 32 | lm loss: 6.686980E+00 | loss scale: 16384.0 | grad norm: 68901.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39714 |
+
time (ms)
|
39715 |
+
iteration 4028/ 159576 | consumed samples: 81648 | elapsed time per iteration (ms): 14518.7 | learning rate: 2.261E-05 | global batch size: 32 | lm loss: 6.454273E+00 | loss scale: 16384.0 | grad norm: 78058.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39716 |
+
time (ms)
|
39717 |
+
iteration 4029/ 159576 | consumed samples: 81680 | elapsed time per iteration (ms): 14751.7 | learning rate: 2.262E-05 | global batch size: 32 | lm loss: 6.645922E+00 | loss scale: 16384.0 | grad norm: 90877.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39718 |
+
time (ms)
|
39719 |
+
iteration 4030/ 159576 | consumed samples: 81712 | elapsed time per iteration (ms): 14605.8 | learning rate: 2.263E-05 | global batch size: 32 | lm loss: 6.554152E+00 | loss scale: 16384.0 | grad norm: 71333.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39720 |
+
time (ms)
|
39721 |
+
iteration 4031/ 159576 | consumed samples: 81744 | elapsed time per iteration (ms): 14567.0 | learning rate: 2.264E-05 | global batch size: 32 | lm loss: 6.512757E+00 | loss scale: 16384.0 | grad norm: 75409.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39722 |
+
time (ms)
|
39723 |
+
iteration 4032/ 159576 | consumed samples: 81776 | elapsed time per iteration (ms): 14627.7 | learning rate: 2.265E-05 | global batch size: 32 | lm loss: 6.529600E+00 | loss scale: 16384.0 | grad norm: 83852.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39724 |
+
time (ms)
|
39725 |
+
iteration 4033/ 159576 | consumed samples: 81808 | elapsed time per iteration (ms): 14706.7 | learning rate: 2.266E-05 | global batch size: 32 | lm loss: 6.312231E+00 | loss scale: 16384.0 | grad norm: 64610.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39726 |
+
time (ms)
|
39727 |
+
iteration 4034/ 159576 | consumed samples: 81840 | elapsed time per iteration (ms): 14453.1 | learning rate: 2.266E-05 | global batch size: 32 | lm loss: 6.378237E+00 | loss scale: 16384.0 | grad norm: 70363.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39728 |
+
time (ms)
|
39729 |
+
iteration 4035/ 159576 | consumed samples: 81872 | elapsed time per iteration (ms): 14558.4 | learning rate: 2.267E-05 | global batch size: 32 | lm loss: 6.617406E+00 | loss scale: 16384.0 | grad norm: 76776.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39730 |
+
time (ms)
|
39731 |
+
iteration 4036/ 159576 | consumed samples: 81904 | elapsed time per iteration (ms): 14451.4 | learning rate: 2.268E-05 | global batch size: 32 | lm loss: 6.510260E+00 | loss scale: 16384.0 | grad norm: 65763.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39732 |
+
time (ms)
|
39733 |
+
iteration 4037/ 159576 | consumed samples: 81936 | elapsed time per iteration (ms): 14734.4 | learning rate: 2.269E-05 | global batch size: 32 | lm loss: 6.484540E+00 | loss scale: 16384.0 | grad norm: 113964.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39734 |
+
time (ms)
|
39735 |
+
iteration 4038/ 159576 | consumed samples: 81968 | elapsed time per iteration (ms): 14560.9 | learning rate: 2.270E-05 | global batch size: 32 | lm loss: 6.422564E+00 | loss scale: 16384.0 | grad norm: 71196.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39736 |
+
time (ms)
|
39737 |
+
iteration 4039/ 159576 | consumed samples: 82000 | elapsed time per iteration (ms): 14521.4 | learning rate: 2.271E-05 | global batch size: 32 | lm loss: 6.468810E+00 | loss scale: 16384.0 | grad norm: 81464.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39738 |
+
time (ms)
|
39739 |
+
iteration 4040/ 159576 | consumed samples: 82032 | elapsed time per iteration (ms): 14534.9 | learning rate: 2.272E-05 | global batch size: 32 | lm loss: 6.528829E+00 | loss scale: 16384.0 | grad norm: 64883.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39740 |
+
time (ms)
|
39741 |
+
iteration 4041/ 159576 | consumed samples: 82064 | elapsed time per iteration (ms): 14840.7 | learning rate: 2.273E-05 | global batch size: 32 | lm loss: 6.466451E+00 | loss scale: 16384.0 | grad norm: 113319.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39742 |
+
time (ms)
|
39743 |
+
iteration 4042/ 159576 | consumed samples: 82096 | elapsed time per iteration (ms): 14627.3 | learning rate: 2.274E-05 | global batch size: 32 | lm loss: 6.455089E+00 | loss scale: 16384.0 | grad norm: 63704.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39744 |
+
time (ms)
|
39745 |
+
iteration 4043/ 159576 | consumed samples: 82128 | elapsed time per iteration (ms): 14401.0 | learning rate: 2.274E-05 | global batch size: 32 | lm loss: 6.394213E+00 | loss scale: 16384.0 | grad norm: 104510.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39746 |
+
time (ms)
|
39747 |
+
iteration 4044/ 159576 | consumed samples: 82160 | elapsed time per iteration (ms): 14522.2 | learning rate: 2.275E-05 | global batch size: 32 | lm loss: 6.436733E+00 | loss scale: 16384.0 | grad norm: 69916.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39748 |
+
time (ms)
|
39749 |
+
iteration 4045/ 159576 | consumed samples: 82192 | elapsed time per iteration (ms): 14878.3 | learning rate: 2.276E-05 | global batch size: 32 | lm loss: 6.467334E+00 | loss scale: 16384.0 | grad norm: 86814.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39750 |
+
time (ms)
|
39751 |
+
iteration 4046/ 159576 | consumed samples: 82224 | elapsed time per iteration (ms): 14619.5 | learning rate: 2.277E-05 | global batch size: 32 | lm loss: 6.542828E+00 | loss scale: 16384.0 | grad norm: 91169.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39752 |
+
time (ms)
|
39753 |
+
iteration 4047/ 159576 | consumed samples: 82256 | elapsed time per iteration (ms): 14546.0 | learning rate: 2.278E-05 | global batch size: 32 | lm loss: 6.482902E+00 | loss scale: 16384.0 | grad norm: 71855.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39754 |
+
time (ms)
|
39755 |
+
iteration 4048/ 159576 | consumed samples: 82288 | elapsed time per iteration (ms): 14535.3 | learning rate: 2.279E-05 | global batch size: 32 | lm loss: 6.380974E+00 | loss scale: 16384.0 | grad norm: 110448.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39756 |
+
time (ms)
|
39757 |
+
iteration 4049/ 159576 | consumed samples: 82320 | elapsed time per iteration (ms): 14946.7 | learning rate: 2.280E-05 | global batch size: 32 | lm loss: 6.604033E+00 | loss scale: 16384.0 | grad norm: 86973.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39758 |
+
time (ms)
|
39759 |
+
iteration 4050/ 159576 | consumed samples: 82352 | elapsed time per iteration (ms): 14452.3 | learning rate: 2.281E-05 | global batch size: 32 | lm loss: 6.485418E+00 | loss scale: 16384.0 | grad norm: 93547.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39760 |
+
time (ms)
|
39761 |
+
iteration 4051/ 159576 | consumed samples: 82384 | elapsed time per iteration (ms): 14486.7 | learning rate: 2.282E-05 | global batch size: 32 | lm loss: 6.447795E+00 | loss scale: 16384.0 | grad norm: 71623.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39762 |
+
time (ms)
|
39763 |
+
iteration 4052/ 159576 | consumed samples: 82416 | elapsed time per iteration (ms): 14546.0 | learning rate: 2.282E-05 | global batch size: 32 | lm loss: 6.490433E+00 | loss scale: 16384.0 | grad norm: 122748.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39764 |
+
time (ms)
|
39765 |
+
iteration 4053/ 159576 | consumed samples: 82448 | elapsed time per iteration (ms): 14923.8 | learning rate: 2.283E-05 | global batch size: 32 | lm loss: 6.393107E+00 | loss scale: 16384.0 | grad norm: 94716.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39766 |
+
time (ms)
|
39767 |
+
iteration 4054/ 159576 | consumed samples: 82480 | elapsed time per iteration (ms): 14522.3 | learning rate: 2.284E-05 | global batch size: 32 | lm loss: 6.560749E+00 | loss scale: 16384.0 | grad norm: 87911.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39768 |
+
time (ms)
|
39769 |
+
iteration 4055/ 159576 | consumed samples: 82512 | elapsed time per iteration (ms): 14576.1 | learning rate: 2.285E-05 | global batch size: 32 | lm loss: 6.508199E+00 | loss scale: 16384.0 | grad norm: 75712.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39770 |
+
time (ms)
|
39771 |
+
iteration 4056/ 159576 | consumed samples: 82544 | elapsed time per iteration (ms): 14509.2 | learning rate: 2.286E-05 | global batch size: 32 | lm loss: 6.480619E+00 | loss scale: 16384.0 | grad norm: 92968.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39772 |
+
time (ms)
|
39773 |
+
iteration 4057/ 159576 | consumed samples: 82576 | elapsed time per iteration (ms): 14814.4 | learning rate: 2.287E-05 | global batch size: 32 | lm loss: 6.324226E+00 | loss scale: 16384.0 | grad norm: 78472.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39774 |
+
time (ms)
|
39775 |
+
iteration 4058/ 159576 | consumed samples: 82608 | elapsed time per iteration (ms): 14459.3 | learning rate: 2.288E-05 | global batch size: 32 | lm loss: 6.626959E+00 | loss scale: 16384.0 | grad norm: 80531.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39776 |
+
time (ms)
|
39777 |
+
iteration 4059/ 159576 | consumed samples: 82640 | elapsed time per iteration (ms): 14496.4 | learning rate: 2.289E-05 | global batch size: 32 | lm loss: 6.406682E+00 | loss scale: 16384.0 | grad norm: 75308.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39778 |
+
time (ms)
|
39779 |
+
iteration 4060/ 159576 | consumed samples: 82672 | elapsed time per iteration (ms): 14562.2 | learning rate: 2.289E-05 | global batch size: 32 | lm loss: 6.440542E+00 | loss scale: 16384.0 | grad norm: 78114.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39780 |
+
time (ms)
|
39781 |
+
iteration 4061/ 159576 | consumed samples: 82704 | elapsed time per iteration (ms): 14796.0 | learning rate: 2.290E-05 | global batch size: 32 | lm loss: 6.468933E+00 | loss scale: 16384.0 | grad norm: 77154.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39782 |
+
time (ms)
|
39783 |
+
iteration 4062/ 159576 | consumed samples: 82736 | elapsed time per iteration (ms): 14696.5 | learning rate: 2.291E-05 | global batch size: 32 | lm loss: 6.318196E+00 | loss scale: 16384.0 | grad norm: 97551.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39784 |
+
time (ms)
|
39785 |
+
iteration 4063/ 159576 | consumed samples: 82768 | elapsed time per iteration (ms): 14468.1 | learning rate: 2.292E-05 | global batch size: 32 | lm loss: 6.472930E+00 | loss scale: 16384.0 | grad norm: 110041.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39786 |
+
time (ms)
|
39787 |
+
iteration 4064/ 159576 | consumed samples: 82800 | elapsed time per iteration (ms): 14496.2 | learning rate: 2.293E-05 | global batch size: 32 | lm loss: 6.523721E+00 | loss scale: 16384.0 | grad norm: 88018.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39788 |
+
time (ms)
|
39789 |
+
iteration 4065/ 159576 | consumed samples: 82832 | elapsed time per iteration (ms): 14563.8 | learning rate: 2.294E-05 | global batch size: 32 | lm loss: 6.453180E+00 | loss scale: 16384.0 | grad norm: 83087.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39790 |
+
time (ms)
|
39791 |
+
iteration 4066/ 159576 | consumed samples: 82864 | elapsed time per iteration (ms): 14884.4 | learning rate: 2.295E-05 | global batch size: 32 | lm loss: 6.447326E+00 | loss scale: 16384.0 | grad norm: 72433.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39792 |
+
time (ms)
|
39793 |
+
iteration 4067/ 159576 | consumed samples: 82896 | elapsed time per iteration (ms): 14491.5 | learning rate: 2.296E-05 | global batch size: 32 | lm loss: 6.366633E+00 | loss scale: 16384.0 | grad norm: 100504.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39794 |
+
time (ms)
|
39795 |
+
iteration 4068/ 159576 | consumed samples: 82928 | elapsed time per iteration (ms): 14561.6 | learning rate: 2.297E-05 | global batch size: 32 | lm loss: 6.315294E+00 | loss scale: 16384.0 | grad norm: 79868.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39796 |
+
time (ms)
|
39797 |
+
iteration 4069/ 159576 | consumed samples: 82960 | elapsed time per iteration (ms): 14538.6 | learning rate: 2.297E-05 | global batch size: 32 | lm loss: 6.452709E+00 | loss scale: 16384.0 | grad norm: 94073.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39798 |
+
time (ms)
|
39799 |
+
iteration 4070/ 159576 | consumed samples: 82992 | elapsed time per iteration (ms): 14651.1 | learning rate: 2.298E-05 | global batch size: 32 | lm loss: 6.421084E+00 | loss scale: 16384.0 | grad norm: 96558.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39800 |
+
time (ms)
|
39801 |
+
iteration 4071/ 159576 | consumed samples: 83024 | elapsed time per iteration (ms): 14508.0 | learning rate: 2.299E-05 | global batch size: 32 | lm loss: 6.474918E+00 | loss scale: 16384.0 | grad norm: 104437.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39802 |
+
time (ms)
|
39803 |
+
iteration 4072/ 159576 | consumed samples: 83056 | elapsed time per iteration (ms): 14540.3 | learning rate: 2.300E-05 | global batch size: 32 | lm loss: 6.442264E+00 | loss scale: 16384.0 | grad norm: 69985.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
39804 |
+
time (ms)
|
39805 |
+
[2021-09-24 18:07:07] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
|
39806 |
+
[2021-09-24 18:07:07] PULSE: tr8-104B is running for 12:14:56 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|