bigscience-bot
commited on
Commit
•
404a45c
1
Parent(s):
874d5b2
new data
Browse files- logs/main_log.txt +442 -0
logs/main_log.txt
CHANGED
@@ -42639,3 +42639,445 @@ time (ms)
|
|
42639 |
time (ms)
|
42640 |
[2021-09-25 00:07:47] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
|
42641 |
[2021-09-25 00:07:47] PULSE: tr8-104B is running for 18:15:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42639 |
time (ms)
|
42640 |
[2021-09-25 00:07:47] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
|
42641 |
[2021-09-25 00:07:47] PULSE: tr8-104B is running for 18:15:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|
42642 |
+
iteration 5481/ 159576 | consumed samples: 146032 | elapsed time per iteration (ms): 16373.0 | learning rate: 4.042E-05 | global batch size: 64 | lm loss: 6.280901E+00 | loss scale: 4096.0 | grad norm: 89214.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42643 |
+
time (ms)
|
42644 |
+
iteration 5482/ 159576 | consumed samples: 146096 | elapsed time per iteration (ms): 16391.1 | learning rate: 4.044E-05 | global batch size: 64 | lm loss: 6.407492E+00 | loss scale: 4096.0 | grad norm: 71190.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42645 |
+
time (ms)
|
42646 |
+
iteration 5483/ 159576 | consumed samples: 146160 | elapsed time per iteration (ms): 16510.6 | learning rate: 4.046E-05 | global batch size: 64 | lm loss: 6.338043E+00 | loss scale: 4096.0 | grad norm: 80052.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42647 |
+
time (ms)
|
42648 |
+
iteration 5484/ 159576 | consumed samples: 146224 | elapsed time per iteration (ms): 16428.2 | learning rate: 4.047E-05 | global batch size: 64 | lm loss: 6.381162E+00 | loss scale: 4096.0 | grad norm: 66785.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42649 |
+
time (ms)
|
42650 |
+
iteration 5485/ 159576 | consumed samples: 146288 | elapsed time per iteration (ms): 16390.1 | learning rate: 4.049E-05 | global batch size: 64 | lm loss: 6.377982E+00 | loss scale: 4096.0 | grad norm: 73739.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42651 |
+
time (ms)
|
42652 |
+
iteration 5486/ 159576 | consumed samples: 146352 | elapsed time per iteration (ms): 16772.0 | learning rate: 4.051E-05 | global batch size: 64 | lm loss: 6.417017E+00 | loss scale: 4096.0 | grad norm: 101012.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42653 |
+
time (ms)
|
42654 |
+
iteration 5487/ 159576 | consumed samples: 146416 | elapsed time per iteration (ms): 16505.3 | learning rate: 4.053E-05 | global batch size: 64 | lm loss: 6.375125E+00 | loss scale: 4096.0 | grad norm: 62796.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42655 |
+
time (ms)
|
42656 |
+
iteration 5488/ 159576 | consumed samples: 146480 | elapsed time per iteration (ms): 16398.9 | learning rate: 4.054E-05 | global batch size: 64 | lm loss: 6.370068E+00 | loss scale: 4096.0 | grad norm: 53653.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42657 |
+
time (ms)
|
42658 |
+
iteration 5489/ 159576 | consumed samples: 146544 | elapsed time per iteration (ms): 16369.7 | learning rate: 4.056E-05 | global batch size: 64 | lm loss: 6.376281E+00 | loss scale: 4096.0 | grad norm: 81099.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42659 |
+
time (ms)
|
42660 |
+
iteration 5490/ 159576 | consumed samples: 146608 | elapsed time per iteration (ms): 16827.2 | learning rate: 4.058E-05 | global batch size: 64 | lm loss: 6.479604E+00 | loss scale: 4096.0 | grad norm: 63855.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42661 |
+
time (ms)
|
42662 |
+
iteration 5491/ 159576 | consumed samples: 146672 | elapsed time per iteration (ms): 16415.6 | learning rate: 4.060E-05 | global batch size: 64 | lm loss: 6.352095E+00 | loss scale: 4096.0 | grad norm: 55122.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42663 |
+
time (ms)
|
42664 |
+
iteration 5492/ 159576 | consumed samples: 146736 | elapsed time per iteration (ms): 16444.9 | learning rate: 4.062E-05 | global batch size: 64 | lm loss: 6.506047E+00 | loss scale: 4096.0 | grad norm: 75137.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42665 |
+
time (ms)
|
42666 |
+
iteration 5493/ 159576 | consumed samples: 146800 | elapsed time per iteration (ms): 16342.5 | learning rate: 4.063E-05 | global batch size: 64 | lm loss: 6.379695E+00 | loss scale: 4096.0 | grad norm: 66901.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42667 |
+
time (ms)
|
42668 |
+
iteration 5494/ 159576 | consumed samples: 146864 | elapsed time per iteration (ms): 16502.1 | learning rate: 4.065E-05 | global batch size: 64 | lm loss: 6.368460E+00 | loss scale: 4096.0 | grad norm: 77897.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42669 |
+
time (ms)
|
42670 |
+
iteration 5495/ 159576 | consumed samples: 146928 | elapsed time per iteration (ms): 16338.1 | learning rate: 4.067E-05 | global batch size: 64 | lm loss: 6.329938E+00 | loss scale: 4096.0 | grad norm: 61931.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42671 |
+
time (ms)
|
42672 |
+
iteration 5496/ 159576 | consumed samples: 146992 | elapsed time per iteration (ms): 16346.0 | learning rate: 4.069E-05 | global batch size: 64 | lm loss: 6.425272E+00 | loss scale: 4096.0 | grad norm: 66524.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42673 |
+
time (ms)
|
42674 |
+
iteration 5497/ 159576 | consumed samples: 147056 | elapsed time per iteration (ms): 16765.2 | learning rate: 4.070E-05 | global batch size: 64 | lm loss: 6.296051E+00 | loss scale: 4096.0 | grad norm: 85285.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42675 |
+
time (ms)
|
42676 |
+
iteration 5498/ 159576 | consumed samples: 147120 | elapsed time per iteration (ms): 16329.2 | learning rate: 4.072E-05 | global batch size: 64 | lm loss: 6.365289E+00 | loss scale: 4096.0 | grad norm: 66015.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42677 |
+
time (ms)
|
42678 |
+
iteration 5499/ 159576 | consumed samples: 147184 | elapsed time per iteration (ms): 16383.4 | learning rate: 4.074E-05 | global batch size: 64 | lm loss: 6.294851E+00 | loss scale: 4096.0 | grad norm: 79758.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42679 |
+
time (ms)
|
42680 |
+
iteration 5500/ 159576 | consumed samples: 147248 | elapsed time per iteration (ms): 16337.1 | learning rate: 4.076E-05 | global batch size: 64 | lm loss: 6.289442E+00 | loss scale: 4096.0 | grad norm: 74687.965 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42681 |
+
time (ms)
|
42682 |
+
iteration 5501/ 159576 | consumed samples: 147312 | elapsed time per iteration (ms): 16790.4 | learning rate: 4.078E-05 | global batch size: 64 | lm loss: 6.322903E+00 | loss scale: 4096.0 | grad norm: 77364.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42683 |
+
time (ms)
|
42684 |
+
iteration 5502/ 159576 | consumed samples: 147376 | elapsed time per iteration (ms): 16423.5 | learning rate: 4.079E-05 | global batch size: 64 | lm loss: 6.460203E+00 | loss scale: 4096.0 | grad norm: 73803.838 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42685 |
+
time (ms)
|
42686 |
+
iteration 5503/ 159576 | consumed samples: 147440 | elapsed time per iteration (ms): 16368.8 | learning rate: 4.081E-05 | global batch size: 64 | lm loss: 6.396315E+00 | loss scale: 4096.0 | grad norm: 71129.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42687 |
+
time (ms)
|
42688 |
+
iteration 5504/ 159576 | consumed samples: 147504 | elapsed time per iteration (ms): 16346.2 | learning rate: 4.083E-05 | global batch size: 64 | lm loss: 6.425894E+00 | loss scale: 4096.0 | grad norm: 98647.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42689 |
+
time (ms)
|
42690 |
+
iteration 5505/ 159576 | consumed samples: 147568 | elapsed time per iteration (ms): 16678.7 | learning rate: 4.085E-05 | global batch size: 64 | lm loss: 6.381792E+00 | loss scale: 4096.0 | grad norm: 89626.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42691 |
+
time (ms)
|
42692 |
+
iteration 5506/ 159576 | consumed samples: 147632 | elapsed time per iteration (ms): 16332.5 | learning rate: 4.086E-05 | global batch size: 64 | lm loss: 6.483613E+00 | loss scale: 4096.0 | grad norm: 94069.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42693 |
+
time (ms)
|
42694 |
+
iteration 5507/ 159576 | consumed samples: 147696 | elapsed time per iteration (ms): 16400.4 | learning rate: 4.088E-05 | global batch size: 64 | lm loss: 6.236539E+00 | loss scale: 4096.0 | grad norm: 66871.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42695 |
+
time (ms)
|
42696 |
+
iteration 5508/ 159576 | consumed samples: 147760 | elapsed time per iteration (ms): 16657.8 | learning rate: 4.090E-05 | global batch size: 64 | lm loss: 6.445796E+00 | loss scale: 4096.0 | grad norm: 79385.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42697 |
+
time (ms)
|
42698 |
+
iteration 5509/ 159576 | consumed samples: 147824 | elapsed time per iteration (ms): 16347.0 | learning rate: 4.092E-05 | global batch size: 64 | lm loss: 6.421635E+00 | loss scale: 4096.0 | grad norm: 76910.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42699 |
+
time (ms)
|
42700 |
+
iteration 5510/ 159576 | consumed samples: 147888 | elapsed time per iteration (ms): 16379.6 | learning rate: 4.093E-05 | global batch size: 64 | lm loss: 6.403854E+00 | loss scale: 4096.0 | grad norm: 131977.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42701 |
+
time (ms)
|
42702 |
+
iteration 5511/ 159576 | consumed samples: 147952 | elapsed time per iteration (ms): 16364.3 | learning rate: 4.095E-05 | global batch size: 64 | lm loss: 6.393543E+00 | loss scale: 4096.0 | grad norm: 62655.958 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42703 |
+
time (ms)
|
42704 |
+
iteration 5512/ 159576 | consumed samples: 148016 | elapsed time per iteration (ms): 16734.0 | learning rate: 4.097E-05 | global batch size: 64 | lm loss: 6.378099E+00 | loss scale: 4096.0 | grad norm: 71057.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42705 |
+
time (ms)
|
42706 |
+
iteration 5513/ 159576 | consumed samples: 148080 | elapsed time per iteration (ms): 16360.1 | learning rate: 4.099E-05 | global batch size: 64 | lm loss: 6.439700E+00 | loss scale: 4096.0 | grad norm: 78346.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42707 |
+
time (ms)
|
42708 |
+
iteration 5514/ 159576 | consumed samples: 148144 | elapsed time per iteration (ms): 16356.7 | learning rate: 4.101E-05 | global batch size: 64 | lm loss: 6.380426E+00 | loss scale: 4096.0 | grad norm: 65583.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42709 |
+
time (ms)
|
42710 |
+
iteration 5515/ 159576 | consumed samples: 148208 | elapsed time per iteration (ms): 16416.2 | learning rate: 4.102E-05 | global batch size: 64 | lm loss: 6.492000E+00 | loss scale: 4096.0 | grad norm: 73724.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42711 |
+
time (ms)
|
42712 |
+
iteration 5516/ 159576 | consumed samples: 148272 | elapsed time per iteration (ms): 16451.6 | learning rate: 4.104E-05 | global batch size: 64 | lm loss: 6.433869E+00 | loss scale: 4096.0 | grad norm: 93695.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42713 |
+
time (ms)
|
42714 |
+
iteration 5517/ 159576 | consumed samples: 148336 | elapsed time per iteration (ms): 16367.1 | learning rate: 4.106E-05 | global batch size: 64 | lm loss: 6.316652E+00 | loss scale: 4096.0 | grad norm: 93995.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42715 |
+
time (ms)
|
42716 |
+
iteration 5518/ 159576 | consumed samples: 148400 | elapsed time per iteration (ms): 16352.2 | learning rate: 4.108E-05 | global batch size: 64 | lm loss: 6.331068E+00 | loss scale: 4096.0 | grad norm: 64601.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42717 |
+
time (ms)
|
42718 |
+
iteration 5519/ 159576 | consumed samples: 148464 | elapsed time per iteration (ms): 16660.3 | learning rate: 4.109E-05 | global batch size: 64 | lm loss: 6.441586E+00 | loss scale: 4096.0 | grad norm: 74837.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42719 |
+
time (ms)
|
42720 |
+
iteration 5520/ 159576 | consumed samples: 148528 | elapsed time per iteration (ms): 16346.7 | learning rate: 4.111E-05 | global batch size: 64 | lm loss: 6.422507E+00 | loss scale: 4096.0 | grad norm: 57013.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42721 |
+
time (ms)
|
42722 |
+
iteration 5521/ 159576 | consumed samples: 148592 | elapsed time per iteration (ms): 16378.9 | learning rate: 4.113E-05 | global batch size: 64 | lm loss: 6.388858E+00 | loss scale: 4096.0 | grad norm: 70843.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42723 |
+
time (ms)
|
42724 |
+
iteration 5522/ 159576 | consumed samples: 148656 | elapsed time per iteration (ms): 16311.3 | learning rate: 4.115E-05 | global batch size: 64 | lm loss: 6.335554E+00 | loss scale: 4096.0 | grad norm: 57811.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42725 |
+
time (ms)
|
42726 |
+
iteration 5523/ 159576 | consumed samples: 148720 | elapsed time per iteration (ms): 16599.0 | learning rate: 4.117E-05 | global batch size: 64 | lm loss: 6.427087E+00 | loss scale: 4096.0 | grad norm: 70169.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42727 |
+
time (ms)
|
42728 |
+
iteration 5524/ 159576 | consumed samples: 148784 | elapsed time per iteration (ms): 16322.1 | learning rate: 4.118E-05 | global batch size: 64 | lm loss: 6.400644E+00 | loss scale: 4096.0 | grad norm: 65162.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42729 |
+
time (ms)
|
42730 |
+
iteration 5525/ 159576 | consumed samples: 148848 | elapsed time per iteration (ms): 16352.5 | learning rate: 4.120E-05 | global batch size: 64 | lm loss: 6.476854E+00 | loss scale: 4096.0 | grad norm: 105828.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42731 |
+
time (ms)
|
42732 |
+
iteration 5526/ 159576 | consumed samples: 148912 | elapsed time per iteration (ms): 16357.9 | learning rate: 4.122E-05 | global batch size: 64 | lm loss: 6.444851E+00 | loss scale: 4096.0 | grad norm: 100931.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42733 |
+
time (ms)
|
42734 |
+
iteration 5527/ 159576 | consumed samples: 148976 | elapsed time per iteration (ms): 16656.2 | learning rate: 4.124E-05 | global batch size: 64 | lm loss: 6.448713E+00 | loss scale: 4096.0 | grad norm: 81570.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42735 |
+
time (ms)
|
42736 |
+
iteration 5528/ 159576 | consumed samples: 149040 | elapsed time per iteration (ms): 16320.4 | learning rate: 4.125E-05 | global batch size: 64 | lm loss: 6.406240E+00 | loss scale: 4096.0 | grad norm: 82766.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42737 |
+
time (ms)
|
42738 |
+
iteration 5529/ 159576 | consumed samples: 149104 | elapsed time per iteration (ms): 16353.3 | learning rate: 4.127E-05 | global batch size: 64 | lm loss: 6.376573E+00 | loss scale: 4096.0 | grad norm: 80155.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42739 |
+
time (ms)
|
42740 |
+
iteration 5530/ 159576 | consumed samples: 149168 | elapsed time per iteration (ms): 16695.5 | learning rate: 4.129E-05 | global batch size: 64 | lm loss: 6.316214E+00 | loss scale: 4096.0 | grad norm: 87358.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42741 |
+
time (ms)
|
42742 |
+
iteration 5531/ 159576 | consumed samples: 149232 | elapsed time per iteration (ms): 16408.8 | learning rate: 4.131E-05 | global batch size: 64 | lm loss: 6.481884E+00 | loss scale: 4096.0 | grad norm: 86550.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42743 |
+
time (ms)
|
42744 |
+
iteration 5532/ 159576 | consumed samples: 149296 | elapsed time per iteration (ms): 16343.8 | learning rate: 4.133E-05 | global batch size: 64 | lm loss: 6.483734E+00 | loss scale: 4096.0 | grad norm: 89939.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42745 |
+
time (ms)
|
42746 |
+
iteration 5533/ 159576 | consumed samples: 149360 | elapsed time per iteration (ms): 16370.7 | learning rate: 4.134E-05 | global batch size: 64 | lm loss: 6.318271E+00 | loss scale: 4096.0 | grad norm: 60516.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42747 |
+
time (ms)
|
42748 |
+
iteration 5534/ 159576 | consumed samples: 149424 | elapsed time per iteration (ms): 16594.8 | learning rate: 4.136E-05 | global batch size: 64 | lm loss: 6.391500E+00 | loss scale: 4096.0 | grad norm: 70379.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42749 |
+
time (ms)
|
42750 |
+
iteration 5535/ 159576 | consumed samples: 149488 | elapsed time per iteration (ms): 16425.6 | learning rate: 4.138E-05 | global batch size: 64 | lm loss: 6.418231E+00 | loss scale: 4096.0 | grad norm: 76225.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42751 |
+
time (ms)
|
42752 |
+
iteration 5536/ 159576 | consumed samples: 149552 | elapsed time per iteration (ms): 16364.4 | learning rate: 4.140E-05 | global batch size: 64 | lm loss: 6.461292E+00 | loss scale: 4096.0 | grad norm: 117347.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42753 |
+
time (ms)
|
42754 |
+
iteration 5537/ 159576 | consumed samples: 149616 | elapsed time per iteration (ms): 16683.3 | learning rate: 4.141E-05 | global batch size: 64 | lm loss: 6.394395E+00 | loss scale: 4096.0 | grad norm: 113236.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42755 |
+
time (ms)
|
42756 |
+
iteration 5538/ 159576 | consumed samples: 149680 | elapsed time per iteration (ms): 16407.6 | learning rate: 4.143E-05 | global batch size: 64 | lm loss: 6.348366E+00 | loss scale: 4096.0 | grad norm: 72699.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42757 |
+
time (ms)
|
42758 |
+
iteration 5539/ 159576 | consumed samples: 149744 | elapsed time per iteration (ms): 16372.4 | learning rate: 4.145E-05 | global batch size: 64 | lm loss: 6.395003E+00 | loss scale: 4096.0 | grad norm: 117054.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42759 |
+
time (ms)
|
42760 |
+
iteration 5540/ 159576 | consumed samples: 149808 | elapsed time per iteration (ms): 16344.7 | learning rate: 4.147E-05 | global batch size: 64 | lm loss: 6.345469E+00 | loss scale: 4096.0 | grad norm: 66826.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42761 |
+
time (ms)
|
42762 |
+
iteration 5541/ 159576 | consumed samples: 149872 | elapsed time per iteration (ms): 16658.7 | learning rate: 4.149E-05 | global batch size: 64 | lm loss: 6.311511E+00 | loss scale: 4096.0 | grad norm: 82398.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42763 |
+
time (ms)
|
42764 |
+
iteration 5542/ 159576 | consumed samples: 149936 | elapsed time per iteration (ms): 16382.8 | learning rate: 4.150E-05 | global batch size: 64 | lm loss: 6.407408E+00 | loss scale: 4096.0 | grad norm: 95381.993 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42765 |
+
time (ms)
|
42766 |
+
iteration 5543/ 159576 | consumed samples: 150000 | elapsed time per iteration (ms): 16397.3 | learning rate: 4.152E-05 | global batch size: 64 | lm loss: 6.385950E+00 | loss scale: 4096.0 | grad norm: 84966.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42767 |
+
time (ms)
|
42768 |
+
iteration 5544/ 159576 | consumed samples: 150064 | elapsed time per iteration (ms): 16328.2 | learning rate: 4.154E-05 | global batch size: 64 | lm loss: 6.386173E+00 | loss scale: 4096.0 | grad norm: 76280.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42769 |
+
time (ms)
|
42770 |
+
iteration 5545/ 159576 | consumed samples: 150128 | elapsed time per iteration (ms): 16536.9 | learning rate: 4.156E-05 | global batch size: 64 | lm loss: 6.429965E+00 | loss scale: 4096.0 | grad norm: 86199.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42771 |
+
time (ms)
|
42772 |
+
iteration 5546/ 159576 | consumed samples: 150192 | elapsed time per iteration (ms): 16341.0 | learning rate: 4.157E-05 | global batch size: 64 | lm loss: 6.440814E+00 | loss scale: 4096.0 | grad norm: 79643.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42773 |
+
time (ms)
|
42774 |
+
iteration 5547/ 159576 | consumed samples: 150256 | elapsed time per iteration (ms): 16434.5 | learning rate: 4.159E-05 | global batch size: 64 | lm loss: 6.292027E+00 | loss scale: 4096.0 | grad norm: 79649.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42775 |
+
time (ms)
|
42776 |
+
iteration 5548/ 159576 | consumed samples: 150320 | elapsed time per iteration (ms): 16744.8 | learning rate: 4.161E-05 | global batch size: 64 | lm loss: 6.363777E+00 | loss scale: 4096.0 | grad norm: 105818.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42777 |
+
time (ms)
|
42778 |
+
iteration 5549/ 159576 | consumed samples: 150384 | elapsed time per iteration (ms): 16446.0 | learning rate: 4.163E-05 | global batch size: 64 | lm loss: 6.525520E+00 | loss scale: 4096.0 | grad norm: 98900.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42779 |
+
time (ms)
|
42780 |
+
iteration 5550/ 159576 | consumed samples: 150448 | elapsed time per iteration (ms): 16313.7 | learning rate: 4.164E-05 | global batch size: 64 | lm loss: 6.426298E+00 | loss scale: 4096.0 | grad norm: 160080.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42781 |
+
time (ms)
|
42782 |
+
iteration 5551/ 159576 | consumed samples: 150512 | elapsed time per iteration (ms): 16414.2 | learning rate: 4.166E-05 | global batch size: 64 | lm loss: 6.409907E+00 | loss scale: 4096.0 | grad norm: 101291.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42783 |
+
time (ms)
|
42784 |
+
iteration 5552/ 159576 | consumed samples: 150576 | elapsed time per iteration (ms): 16772.9 | learning rate: 4.168E-05 | global batch size: 64 | lm loss: 6.312022E+00 | loss scale: 4096.0 | grad norm: 93961.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42785 |
+
time (ms)
|
42786 |
+
iteration 5553/ 159576 | consumed samples: 150640 | elapsed time per iteration (ms): 16393.9 | learning rate: 4.170E-05 | global batch size: 64 | lm loss: 6.460764E+00 | loss scale: 4096.0 | grad norm: 83044.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42787 |
+
time (ms)
|
42788 |
+
iteration 5554/ 159576 | consumed samples: 150704 | elapsed time per iteration (ms): 16414.7 | learning rate: 4.172E-05 | global batch size: 64 | lm loss: 6.395907E+00 | loss scale: 4096.0 | grad norm: 71935.935 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42789 |
+
time (ms)
|
42790 |
+
iteration 5555/ 159576 | consumed samples: 150768 | elapsed time per iteration (ms): 16459.3 | learning rate: 4.173E-05 | global batch size: 64 | lm loss: 6.381772E+00 | loss scale: 4096.0 | grad norm: 92358.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42791 |
+
time (ms)
|
42792 |
+
iteration 5556/ 159576 | consumed samples: 150832 | elapsed time per iteration (ms): 16620.5 | learning rate: 4.175E-05 | global batch size: 64 | lm loss: 6.334103E+00 | loss scale: 4096.0 | grad norm: 135953.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42793 |
+
time (ms)
|
42794 |
+
iteration 5557/ 159576 | consumed samples: 150896 | elapsed time per iteration (ms): 16420.0 | learning rate: 4.177E-05 | global batch size: 64 | lm loss: 6.350534E+00 | loss scale: 4096.0 | grad norm: 106866.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42795 |
+
time (ms)
|
42796 |
+
iteration 5558/ 159576 | consumed samples: 150960 | elapsed time per iteration (ms): 16394.5 | learning rate: 4.179E-05 | global batch size: 64 | lm loss: 6.449617E+00 | loss scale: 4096.0 | grad norm: 73758.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42797 |
+
time (ms)
|
42798 |
+
iteration 5559/ 159576 | consumed samples: 151024 | elapsed time per iteration (ms): 16702.3 | learning rate: 4.180E-05 | global batch size: 64 | lm loss: 6.422152E+00 | loss scale: 4096.0 | grad norm: 89216.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42799 |
+
time (ms)
|
42800 |
+
iteration 5560/ 159576 | consumed samples: 151088 | elapsed time per iteration (ms): 16526.0 | learning rate: 4.182E-05 | global batch size: 64 | lm loss: 6.502412E+00 | loss scale: 4096.0 | grad norm: 75899.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42801 |
+
time (ms)
|
42802 |
+
iteration 5561/ 159576 | consumed samples: 151152 | elapsed time per iteration (ms): 16388.8 | learning rate: 4.184E-05 | global batch size: 64 | lm loss: 6.353260E+00 | loss scale: 4096.0 | grad norm: 77216.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42803 |
+
time (ms)
|
42804 |
+
iteration 5562/ 159576 | consumed samples: 151216 | elapsed time per iteration (ms): 16375.8 | learning rate: 4.186E-05 | global batch size: 64 | lm loss: 6.380834E+00 | loss scale: 4096.0 | grad norm: 108978.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42805 |
+
time (ms)
|
42806 |
+
iteration 5563/ 159576 | consumed samples: 151280 | elapsed time per iteration (ms): 16840.5 | learning rate: 4.188E-05 | global batch size: 64 | lm loss: 6.389106E+00 | loss scale: 4096.0 | grad norm: 109665.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42807 |
+
time (ms)
|
42808 |
+
iteration 5564/ 159576 | consumed samples: 151344 | elapsed time per iteration (ms): 16437.6 | learning rate: 4.189E-05 | global batch size: 64 | lm loss: 6.440452E+00 | loss scale: 4096.0 | grad norm: 455190.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42809 |
+
time (ms)
|
42810 |
+
iteration 5565/ 159576 | consumed samples: 151408 | elapsed time per iteration (ms): 16403.9 | learning rate: 4.191E-05 | global batch size: 64 | lm loss: 6.425446E+00 | loss scale: 4096.0 | grad norm: 121150.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42811 |
+
time (ms)
|
42812 |
+
iteration 5566/ 159576 | consumed samples: 151472 | elapsed time per iteration (ms): 16435.1 | learning rate: 4.193E-05 | global batch size: 64 | lm loss: 6.344089E+00 | loss scale: 4096.0 | grad norm: 92189.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42813 |
+
time (ms)
|
42814 |
+
iteration 5567/ 159576 | consumed samples: 151536 | elapsed time per iteration (ms): 16459.4 | learning rate: 4.195E-05 | global batch size: 64 | lm loss: 6.402337E+00 | loss scale: 4096.0 | grad norm: 84995.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42815 |
+
time (ms)
|
42816 |
+
iteration 5568/ 159576 | consumed samples: 151600 | elapsed time per iteration (ms): 16389.2 | learning rate: 4.196E-05 | global batch size: 64 | lm loss: 6.522965E+00 | loss scale: 4096.0 | grad norm: 82583.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42817 |
+
time (ms)
|
42818 |
+
iteration 5569/ 159576 | consumed samples: 151664 | elapsed time per iteration (ms): 16371.9 | learning rate: 4.198E-05 | global batch size: 64 | lm loss: 6.357002E+00 | loss scale: 4096.0 | grad norm: 107776.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42819 |
+
time (ms)
|
42820 |
+
iteration 5570/ 159576 | consumed samples: 151728 | elapsed time per iteration (ms): 16715.6 | learning rate: 4.200E-05 | global batch size: 64 | lm loss: 6.462955E+00 | loss scale: 4096.0 | grad norm: 81656.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42821 |
+
time (ms)
|
42822 |
+
iteration 5571/ 159576 | consumed samples: 151792 | elapsed time per iteration (ms): 16448.5 | learning rate: 4.202E-05 | global batch size: 64 | lm loss: 6.378518E+00 | loss scale: 4096.0 | grad norm: 97168.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42823 |
+
time (ms)
|
42824 |
+
iteration 5572/ 159576 | consumed samples: 151856 | elapsed time per iteration (ms): 16375.2 | learning rate: 4.204E-05 | global batch size: 64 | lm loss: 6.426227E+00 | loss scale: 4096.0 | grad norm: 138499.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42825 |
+
time (ms)
|
42826 |
+
iteration 5573/ 159576 | consumed samples: 151920 | elapsed time per iteration (ms): 16391.0 | learning rate: 4.205E-05 | global batch size: 64 | lm loss: 6.467142E+00 | loss scale: 4096.0 | grad norm: 86986.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42827 |
+
time (ms)
|
42828 |
+
iteration 5574/ 159576 | consumed samples: 151984 | elapsed time per iteration (ms): 16660.3 | learning rate: 4.207E-05 | global batch size: 64 | lm loss: 6.343758E+00 | loss scale: 4096.0 | grad norm: 94104.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42829 |
+
time (ms)
|
42830 |
+
iteration 5575/ 159576 | consumed samples: 152048 | elapsed time per iteration (ms): 16384.3 | learning rate: 4.209E-05 | global batch size: 64 | lm loss: 6.264513E+00 | loss scale: 4096.0 | grad norm: 84463.915 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42831 |
+
time (ms)
|
42832 |
+
iteration 5576/ 159576 | consumed samples: 152112 | elapsed time per iteration (ms): 16429.0 | learning rate: 4.211E-05 | global batch size: 64 | lm loss: 6.395695E+00 | loss scale: 4096.0 | grad norm: 91060.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42833 |
+
time (ms)
|
42834 |
+
iteration 5577/ 159576 | consumed samples: 152176 | elapsed time per iteration (ms): 16399.6 | learning rate: 4.212E-05 | global batch size: 64 | lm loss: 6.322819E+00 | loss scale: 4096.0 | grad norm: 78884.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42835 |
+
time (ms)
|
42836 |
+
iteration 5578/ 159576 | consumed samples: 152240 | elapsed time per iteration (ms): 16529.4 | learning rate: 4.214E-05 | global batch size: 64 | lm loss: 6.361033E+00 | loss scale: 4096.0 | grad norm: 132712.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42837 |
+
time (ms)
|
42838 |
+
iteration 5579/ 159576 | consumed samples: 152304 | elapsed time per iteration (ms): 16454.4 | learning rate: 4.216E-05 | global batch size: 64 | lm loss: 6.276022E+00 | loss scale: 4096.0 | grad norm: 112417.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42839 |
+
time (ms)
|
42840 |
+
iteration 5580/ 159576 | consumed samples: 152368 | elapsed time per iteration (ms): 16401.1 | learning rate: 4.218E-05 | global batch size: 64 | lm loss: 6.375633E+00 | loss scale: 4096.0 | grad norm: 85824.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42841 |
+
time (ms)
|
42842 |
+
iteration 5581/ 159576 | consumed samples: 152432 | elapsed time per iteration (ms): 16688.1 | learning rate: 4.220E-05 | global batch size: 64 | lm loss: 6.447036E+00 | loss scale: 4096.0 | grad norm: 88314.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42843 |
+
time (ms)
|
42844 |
+
iteration 5582/ 159576 | consumed samples: 152496 | elapsed time per iteration (ms): 16427.8 | learning rate: 4.221E-05 | global batch size: 64 | lm loss: 6.438461E+00 | loss scale: 4096.0 | grad norm: 91826.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42845 |
+
time (ms)
|
42846 |
+
iteration 5583/ 159576 | consumed samples: 152560 | elapsed time per iteration (ms): 16326.4 | learning rate: 4.223E-05 | global batch size: 64 | lm loss: 6.404251E+00 | loss scale: 4096.0 | grad norm: 79746.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42847 |
+
time (ms)
|
42848 |
+
iteration 5584/ 159576 | consumed samples: 152624 | elapsed time per iteration (ms): 16429.7 | learning rate: 4.225E-05 | global batch size: 64 | lm loss: 6.470784E+00 | loss scale: 4096.0 | grad norm: 78255.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42849 |
+
time (ms)
|
42850 |
+
iteration 5585/ 159576 | consumed samples: 152688 | elapsed time per iteration (ms): 16577.7 | learning rate: 4.227E-05 | global batch size: 64 | lm loss: 6.352365E+00 | loss scale: 4096.0 | grad norm: 85894.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42851 |
+
time (ms)
|
42852 |
+
iteration 5586/ 159576 | consumed samples: 152752 | elapsed time per iteration (ms): 16409.6 | learning rate: 4.228E-05 | global batch size: 64 | lm loss: 6.367690E+00 | loss scale: 4096.0 | grad norm: 268686.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42853 |
+
time (ms)
|
42854 |
+
iteration 5587/ 159576 | consumed samples: 152816 | elapsed time per iteration (ms): 16393.7 | learning rate: 4.230E-05 | global batch size: 64 | lm loss: 6.334382E+00 | loss scale: 4096.0 | grad norm: 92996.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42855 |
+
time (ms)
|
42856 |
+
iteration 5588/ 159576 | consumed samples: 152880 | elapsed time per iteration (ms): 16647.8 | learning rate: 4.232E-05 | global batch size: 64 | lm loss: 6.174354E+00 | loss scale: 4096.0 | grad norm: 99570.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42857 |
+
time (ms)
|
42858 |
+
iteration 5589/ 159576 | consumed samples: 152944 | elapsed time per iteration (ms): 16470.5 | learning rate: 4.234E-05 | global batch size: 64 | lm loss: 6.349049E+00 | loss scale: 4096.0 | grad norm: 74523.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42859 |
+
time (ms)
|
42860 |
+
iteration 5590/ 159576 | consumed samples: 153008 | elapsed time per iteration (ms): 16348.7 | learning rate: 4.236E-05 | global batch size: 64 | lm loss: 6.388356E+00 | loss scale: 4096.0 | grad norm: 57623.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42861 |
+
time (ms)
|
42862 |
+
iteration 5591/ 159576 | consumed samples: 153072 | elapsed time per iteration (ms): 16338.9 | learning rate: 4.237E-05 | global batch size: 64 | lm loss: 6.399694E+00 | loss scale: 4096.0 | grad norm: 75852.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42863 |
+
time (ms)
|
42864 |
+
iteration 5592/ 159576 | consumed samples: 153136 | elapsed time per iteration (ms): 16704.7 | learning rate: 4.239E-05 | global batch size: 64 | lm loss: 6.327959E+00 | loss scale: 4096.0 | grad norm: 69452.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42865 |
+
time (ms)
|
42866 |
+
iteration 5593/ 159576 | consumed samples: 153200 | elapsed time per iteration (ms): 16334.3 | learning rate: 4.241E-05 | global batch size: 64 | lm loss: 6.435533E+00 | loss scale: 4096.0 | grad norm: 111529.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42867 |
+
time (ms)
|
42868 |
+
iteration 5594/ 159576 | consumed samples: 153264 | elapsed time per iteration (ms): 16385.3 | learning rate: 4.243E-05 | global batch size: 64 | lm loss: 6.438297E+00 | loss scale: 4096.0 | grad norm: 154695.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42869 |
+
time (ms)
|
42870 |
+
iteration 5595/ 159576 | consumed samples: 153328 | elapsed time per iteration (ms): 16343.1 | learning rate: 4.244E-05 | global batch size: 64 | lm loss: 6.431480E+00 | loss scale: 4096.0 | grad norm: 133987.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42871 |
+
time (ms)
|
42872 |
+
iteration 5596/ 159576 | consumed samples: 153392 | elapsed time per iteration (ms): 16571.5 | learning rate: 4.246E-05 | global batch size: 64 | lm loss: 6.326744E+00 | loss scale: 4096.0 | grad norm: 65072.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42873 |
+
time (ms)
|
42874 |
+
iteration 5597/ 159576 | consumed samples: 153456 | elapsed time per iteration (ms): 16304.0 | learning rate: 4.248E-05 | global batch size: 64 | lm loss: 6.450805E+00 | loss scale: 4096.0 | grad norm: 67613.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42875 |
+
time (ms)
|
42876 |
+
iteration 5598/ 159576 | consumed samples: 153520 | elapsed time per iteration (ms): 16343.8 | learning rate: 4.250E-05 | global batch size: 64 | lm loss: 6.327376E+00 | loss scale: 4096.0 | grad norm: 77614.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42877 |
+
time (ms)
|
42878 |
+
iteration 5599/ 159576 | consumed samples: 153584 | elapsed time per iteration (ms): 16672.4 | learning rate: 4.251E-05 | global batch size: 64 | lm loss: 6.502485E+00 | loss scale: 4096.0 | grad norm: 97568.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42879 |
+
time (ms)
|
42880 |
+
iteration 5600/ 159576 | consumed samples: 153648 | elapsed time per iteration (ms): 16410.3 | learning rate: 4.253E-05 | global batch size: 64 | lm loss: 6.429380E+00 | loss scale: 4096.0 | grad norm: 84231.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42881 |
+
time (ms)
|
42882 |
+
iteration 5601/ 159576 | consumed samples: 153712 | elapsed time per iteration (ms): 16391.0 | learning rate: 4.255E-05 | global batch size: 64 | lm loss: 6.436201E+00 | loss scale: 4096.0 | grad norm: 63319.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42883 |
+
time (ms)
|
42884 |
+
iteration 5602/ 159576 | consumed samples: 153776 | elapsed time per iteration (ms): 16453.8 | learning rate: 4.257E-05 | global batch size: 64 | lm loss: 6.263167E+00 | loss scale: 4096.0 | grad norm: 71392.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42885 |
+
time (ms)
|
42886 |
+
iteration 5603/ 159576 | consumed samples: 153840 | elapsed time per iteration (ms): 16775.3 | learning rate: 4.259E-05 | global batch size: 64 | lm loss: 6.413259E+00 | loss scale: 4096.0 | grad norm: 123761.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42887 |
+
time (ms)
|
42888 |
+
iteration 5604/ 159576 | consumed samples: 153904 | elapsed time per iteration (ms): 16504.7 | learning rate: 4.260E-05 | global batch size: 64 | lm loss: 6.544505E+00 | loss scale: 4096.0 | grad norm: 83624.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42889 |
+
time (ms)
|
42890 |
+
iteration 5605/ 159576 | consumed samples: 153968 | elapsed time per iteration (ms): 16306.6 | learning rate: 4.262E-05 | global batch size: 64 | lm loss: 6.452788E+00 | loss scale: 8192.0 | grad norm: 65011.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42891 |
+
time (ms)
|
42892 |
+
iteration 5606/ 159576 | consumed samples: 154032 | elapsed time per iteration (ms): 16378.4 | learning rate: 4.264E-05 | global batch size: 64 | lm loss: 6.422714E+00 | loss scale: 8192.0 | grad norm: 246798.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42893 |
+
time (ms)
|
42894 |
+
iteration 5607/ 159576 | consumed samples: 154096 | elapsed time per iteration (ms): 16552.8 | learning rate: 4.266E-05 | global batch size: 64 | lm loss: 6.375990E+00 | loss scale: 8192.0 | grad norm: 169739.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42895 |
+
time (ms)
|
42896 |
+
iteration 5608/ 159576 | consumed samples: 154160 | elapsed time per iteration (ms): 16382.8 | learning rate: 4.267E-05 | global batch size: 64 | lm loss: 6.358736E+00 | loss scale: 8192.0 | grad norm: 157950.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42897 |
+
time (ms)
|
42898 |
+
iteration 5609/ 159576 | consumed samples: 154224 | elapsed time per iteration (ms): 16422.0 | learning rate: 4.269E-05 | global batch size: 64 | lm loss: 6.444921E+00 | loss scale: 8192.0 | grad norm: 125911.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42899 |
+
time (ms)
|
42900 |
+
iteration 5610/ 159576 | consumed samples: 154288 | elapsed time per iteration (ms): 9561.0 | learning rate: 4.269E-05 | global batch size: 64 | lm loss: 6.367582E+00 | loss scale: 8192.0 | grad norm: 125911.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42901 |
+
time (ms)
|
42902 |
+
iteration 5611/ 159576 | consumed samples: 154352 | elapsed time per iteration (ms): 16020.4 | learning rate: 4.271E-05 | global batch size: 64 | lm loss: 6.341266E+00 | loss scale: 8192.0 | grad norm: 196277.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42903 |
+
time (ms)
|
42904 |
+
iteration 5612/ 159576 | consumed samples: 154416 | elapsed time per iteration (ms): 16411.4 | learning rate: 4.273E-05 | global batch size: 64 | lm loss: 6.386235E+00 | loss scale: 8192.0 | grad norm: 174236.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42905 |
+
time (ms)
|
42906 |
+
iteration 5613/ 159576 | consumed samples: 154480 | elapsed time per iteration (ms): 16406.8 | learning rate: 4.275E-05 | global batch size: 64 | lm loss: 6.302393E+00 | loss scale: 8192.0 | grad norm: 159949.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42907 |
+
time (ms)
|
42908 |
+
iteration 5614/ 159576 | consumed samples: 154544 | elapsed time per iteration (ms): 16823.0 | learning rate: 4.276E-05 | global batch size: 64 | lm loss: 6.427998E+00 | loss scale: 8192.0 | grad norm: 139822.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42909 |
+
time (ms)
|
42910 |
+
iteration 5615/ 159576 | consumed samples: 154608 | elapsed time per iteration (ms): 16523.9 | learning rate: 4.278E-05 | global batch size: 64 | lm loss: 6.437964E+00 | loss scale: 8192.0 | grad norm: 148561.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42911 |
+
time (ms)
|
42912 |
+
iteration 5616/ 159576 | consumed samples: 154672 | elapsed time per iteration (ms): 16444.1 | learning rate: 4.280E-05 | global batch size: 64 | lm loss: 6.387279E+00 | loss scale: 8192.0 | grad norm: 165172.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42913 |
+
time (ms)
|
42914 |
+
iteration 5617/ 159576 | consumed samples: 154736 | elapsed time per iteration (ms): 16455.6 | learning rate: 4.282E-05 | global batch size: 64 | lm loss: 6.365323E+00 | loss scale: 8192.0 | grad norm: 139740.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42915 |
+
time (ms)
|
42916 |
+
iteration 5618/ 159576 | consumed samples: 154800 | elapsed time per iteration (ms): 16876.6 | learning rate: 4.283E-05 | global batch size: 64 | lm loss: 6.405371E+00 | loss scale: 8192.0 | grad norm: 191865.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42917 |
+
time (ms)
|
42918 |
+
iteration 5619/ 159576 | consumed samples: 154864 | elapsed time per iteration (ms): 16465.6 | learning rate: 4.285E-05 | global batch size: 64 | lm loss: 6.400004E+00 | loss scale: 8192.0 | grad norm: 131301.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42919 |
+
time (ms)
|
42920 |
+
iteration 5620/ 159576 | consumed samples: 154928 | elapsed time per iteration (ms): 16407.9 | learning rate: 4.287E-05 | global batch size: 64 | lm loss: 6.424757E+00 | loss scale: 8192.0 | grad norm: 152162.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42921 |
+
time (ms)
|
42922 |
+
iteration 5621/ 159576 | consumed samples: 154992 | elapsed time per iteration (ms): 16429.7 | learning rate: 4.289E-05 | global batch size: 64 | lm loss: 6.415905E+00 | loss scale: 8192.0 | grad norm: 184054.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42923 |
+
time (ms)
|
42924 |
+
iteration 5622/ 159576 | consumed samples: 155056 | elapsed time per iteration (ms): 16685.6 | learning rate: 4.291E-05 | global batch size: 64 | lm loss: 6.440601E+00 | loss scale: 8192.0 | grad norm: 290641.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42925 |
+
time (ms)
|
42926 |
+
iteration 5623/ 159576 | consumed samples: 155120 | elapsed time per iteration (ms): 16500.9 | learning rate: 4.292E-05 | global batch size: 64 | lm loss: 6.392663E+00 | loss scale: 8192.0 | grad norm: 151394.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42927 |
+
time (ms)
|
42928 |
+
iteration 5624/ 159576 | consumed samples: 155184 | elapsed time per iteration (ms): 16485.6 | learning rate: 4.294E-05 | global batch size: 64 | lm loss: 6.440325E+00 | loss scale: 8192.0 | grad norm: 132735.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42929 |
+
time (ms)
|
42930 |
+
iteration 5625/ 159576 | consumed samples: 155248 | elapsed time per iteration (ms): 16832.2 | learning rate: 4.296E-05 | global batch size: 64 | lm loss: 6.382560E+00 | loss scale: 8192.0 | grad norm: 167706.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42931 |
+
time (ms)
|
42932 |
+
iteration 5626/ 159576 | consumed samples: 155312 | elapsed time per iteration (ms): 16294.5 | learning rate: 4.298E-05 | global batch size: 64 | lm loss: 6.422318E+00 | loss scale: 8192.0 | grad norm: 144671.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42933 |
+
time (ms)
|
42934 |
+
iteration 5627/ 159576 | consumed samples: 155376 | elapsed time per iteration (ms): 16433.6 | learning rate: 4.299E-05 | global batch size: 64 | lm loss: 6.400022E+00 | loss scale: 8192.0 | grad norm: 174837.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42935 |
+
time (ms)
|
42936 |
+
iteration 5628/ 159576 | consumed samples: 155440 | elapsed time per iteration (ms): 16385.0 | learning rate: 4.301E-05 | global batch size: 64 | lm loss: 6.465958E+00 | loss scale: 8192.0 | grad norm: 167317.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42937 |
+
time (ms)
|
42938 |
+
iteration 5629/ 159576 | consumed samples: 155504 | elapsed time per iteration (ms): 16829.3 | learning rate: 4.303E-05 | global batch size: 64 | lm loss: 6.365539E+00 | loss scale: 8192.0 | grad norm: 150073.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42939 |
+
time (ms)
|
42940 |
+
iteration 5630/ 159576 | consumed samples: 155568 | elapsed time per iteration (ms): 16533.0 | learning rate: 4.305E-05 | global batch size: 64 | lm loss: 6.385098E+00 | loss scale: 8192.0 | grad norm: 132923.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42941 |
+
time (ms)
|
42942 |
+
iteration 5631/ 159576 | consumed samples: 155632 | elapsed time per iteration (ms): 16451.7 | learning rate: 4.307E-05 | global batch size: 64 | lm loss: 6.314290E+00 | loss scale: 8192.0 | grad norm: 178222.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42943 |
+
time (ms)
|
42944 |
+
iteration 5632/ 159576 | consumed samples: 155696 | elapsed time per iteration (ms): 16400.8 | learning rate: 4.308E-05 | global batch size: 64 | lm loss: 6.467572E+00 | loss scale: 8192.0 | grad norm: 147545.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42945 |
+
time (ms)
|
42946 |
+
iteration 5633/ 159576 | consumed samples: 155760 | elapsed time per iteration (ms): 16566.1 | learning rate: 4.310E-05 | global batch size: 64 | lm loss: 6.341013E+00 | loss scale: 8192.0 | grad norm: 200712.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42947 |
+
time (ms)
|
42948 |
+
iteration 5634/ 159576 | consumed samples: 155824 | elapsed time per iteration (ms): 16393.9 | learning rate: 4.312E-05 | global batch size: 64 | lm loss: 6.319093E+00 | loss scale: 8192.0 | grad norm: 161666.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42949 |
+
time (ms)
|
42950 |
+
iteration 5635/ 159576 | consumed samples: 155888 | elapsed time per iteration (ms): 16416.9 | learning rate: 4.314E-05 | global batch size: 64 | lm loss: 6.461274E+00 | loss scale: 8192.0 | grad norm: 572124.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42951 |
+
time (ms)
|
42952 |
+
iteration 5636/ 159576 | consumed samples: 155952 | elapsed time per iteration (ms): 16756.4 | learning rate: 4.315E-05 | global batch size: 64 | lm loss: 6.453969E+00 | loss scale: 8192.0 | grad norm: 205582.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42953 |
+
time (ms)
|
42954 |
+
iteration 5637/ 159576 | consumed samples: 156016 | elapsed time per iteration (ms): 16349.2 | learning rate: 4.317E-05 | global batch size: 64 | lm loss: 6.386354E+00 | loss scale: 8192.0 | grad norm: 188662.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42955 |
+
time (ms)
|
42956 |
+
iteration 5638/ 159576 | consumed samples: 156080 | elapsed time per iteration (ms): 16437.2 | learning rate: 4.319E-05 | global batch size: 64 | lm loss: 6.458478E+00 | loss scale: 8192.0 | grad norm: 208129.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42957 |
+
time (ms)
|
42958 |
+
iteration 5639/ 159576 | consumed samples: 156144 | elapsed time per iteration (ms): 16478.4 | learning rate: 4.321E-05 | global batch size: 64 | lm loss: 6.361111E+00 | loss scale: 8192.0 | grad norm: 383224.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42959 |
+
time (ms)
|
42960 |
+
iteration 5640/ 159576 | consumed samples: 156208 | elapsed time per iteration (ms): 16543.3 | learning rate: 4.322E-05 | global batch size: 64 | lm loss: 6.470639E+00 | loss scale: 8192.0 | grad norm: 244281.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42961 |
+
time (ms)
|
42962 |
+
iteration 5641/ 159576 | consumed samples: 156272 | elapsed time per iteration (ms): 16418.6 | learning rate: 4.324E-05 | global batch size: 64 | lm loss: 6.453573E+00 | loss scale: 8192.0 | grad norm: 246555.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42963 |
+
time (ms)
|
42964 |
+
iteration 5642/ 159576 | consumed samples: 156336 | elapsed time per iteration (ms): 16347.0 | learning rate: 4.326E-05 | global batch size: 64 | lm loss: 6.416644E+00 | loss scale: 8192.0 | grad norm: 177394.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42965 |
+
time (ms)
|
42966 |
+
iteration 5643/ 159576 | consumed samples: 156400 | elapsed time per iteration (ms): 9564.0 | learning rate: 4.326E-05 | global batch size: 64 | lm loss: 6.433064E+00 | loss scale: 4096.0 | grad norm: 177394.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42967 |
+
time (ms)
|
42968 |
+
iteration 5644/ 159576 | consumed samples: 156464 | elapsed time per iteration (ms): 16246.5 | learning rate: 4.328E-05 | global batch size: 64 | lm loss: 6.334921E+00 | loss scale: 4096.0 | grad norm: 91031.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42969 |
+
time (ms)
|
42970 |
+
iteration 5645/ 159576 | consumed samples: 156528 | elapsed time per iteration (ms): 16410.8 | learning rate: 4.330E-05 | global batch size: 64 | lm loss: 6.398224E+00 | loss scale: 4096.0 | grad norm: 82899.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42971 |
+
time (ms)
|
42972 |
+
iteration 5646/ 159576 | consumed samples: 156592 | elapsed time per iteration (ms): 16332.5 | learning rate: 4.331E-05 | global batch size: 64 | lm loss: 6.469447E+00 | loss scale: 4096.0 | grad norm: 93235.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42973 |
+
time (ms)
|
42974 |
+
iteration 5647/ 159576 | consumed samples: 156656 | elapsed time per iteration (ms): 16380.9 | learning rate: 4.333E-05 | global batch size: 64 | lm loss: 6.414939E+00 | loss scale: 4096.0 | grad norm: 98498.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42975 |
+
time (ms)
|
42976 |
+
iteration 5648/ 159576 | consumed samples: 156720 | elapsed time per iteration (ms): 16453.9 | learning rate: 4.335E-05 | global batch size: 64 | lm loss: 6.435335E+00 | loss scale: 4096.0 | grad norm: 110431.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42977 |
+
time (ms)
|
42978 |
+
iteration 5649/ 159576 | consumed samples: 156784 | elapsed time per iteration (ms): 16375.1 | learning rate: 4.337E-05 | global batch size: 64 | lm loss: 6.367991E+00 | loss scale: 4096.0 | grad norm: 112025.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42979 |
+
time (ms)
|
42980 |
+
iteration 5650/ 159576 | consumed samples: 156848 | elapsed time per iteration (ms): 16396.5 | learning rate: 4.338E-05 | global batch size: 64 | lm loss: 6.453450E+00 | loss scale: 4096.0 | grad norm: 142538.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42981 |
+
time (ms)
|
42982 |
+
iteration 5651/ 159576 | consumed samples: 156912 | elapsed time per iteration (ms): 16662.1 | learning rate: 4.340E-05 | global batch size: 64 | lm loss: 6.376512E+00 | loss scale: 4096.0 | grad norm: 104884.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42983 |
+
time (ms)
|
42984 |
+
iteration 5652/ 159576 | consumed samples: 156976 | elapsed time per iteration (ms): 16397.7 | learning rate: 4.342E-05 | global batch size: 64 | lm loss: 6.398083E+00 | loss scale: 4096.0 | grad norm: 97434.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42985 |
+
time (ms)
|
42986 |
+
iteration 5653/ 159576 | consumed samples: 157040 | elapsed time per iteration (ms): 16367.3 | learning rate: 4.344E-05 | global batch size: 64 | lm loss: 6.468301E+00 | loss scale: 4096.0 | grad norm: 189503.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42987 |
+
time (ms)
|
42988 |
+
iteration 5654/ 159576 | consumed samples: 157104 | elapsed time per iteration (ms): 16332.7 | learning rate: 4.346E-05 | global batch size: 64 | lm loss: 6.449702E+00 | loss scale: 4096.0 | grad norm: 101635.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42989 |
+
time (ms)
|
42990 |
+
iteration 5655/ 159576 | consumed samples: 157168 | elapsed time per iteration (ms): 16814.3 | learning rate: 4.347E-05 | global batch size: 64 | lm loss: 6.417078E+00 | loss scale: 4096.0 | grad norm: 163445.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42991 |
+
time (ms)
|
42992 |
+
iteration 5656/ 159576 | consumed samples: 157232 | elapsed time per iteration (ms): 16304.4 | learning rate: 4.349E-05 | global batch size: 64 | lm loss: 6.445296E+00 | loss scale: 4096.0 | grad norm: 90409.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42993 |
+
time (ms)
|
42994 |
+
iteration 5657/ 159576 | consumed samples: 157296 | elapsed time per iteration (ms): 16400.9 | learning rate: 4.351E-05 | global batch size: 64 | lm loss: 6.445564E+00 | loss scale: 4096.0 | grad norm: 81513.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42995 |
+
time (ms)
|
42996 |
+
iteration 5658/ 159576 | consumed samples: 157360 | elapsed time per iteration (ms): 16340.5 | learning rate: 4.353E-05 | global batch size: 64 | lm loss: 6.333720E+00 | loss scale: 4096.0 | grad norm: 134428.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42997 |
+
time (ms)
|
42998 |
+
iteration 5659/ 159576 | consumed samples: 157424 | elapsed time per iteration (ms): 16553.5 | learning rate: 4.354E-05 | global batch size: 64 | lm loss: 6.401426E+00 | loss scale: 4096.0 | grad norm: 106022.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
42999 |
+
time (ms)
|
43000 |
+
iteration 5660/ 159576 | consumed samples: 157488 | elapsed time per iteration (ms): 16387.3 | learning rate: 4.356E-05 | global batch size: 64 | lm loss: 6.438997E+00 | loss scale: 4096.0 | grad norm: 83955.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43001 |
+
time (ms)
|
43002 |
+
iteration 5661/ 159576 | consumed samples: 157552 | elapsed time per iteration (ms): 16456.3 | learning rate: 4.358E-05 | global batch size: 64 | lm loss: 6.402083E+00 | loss scale: 4096.0 | grad norm: 85068.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43003 |
+
time (ms)
|
43004 |
+
iteration 5662/ 159576 | consumed samples: 157616 | elapsed time per iteration (ms): 16696.8 | learning rate: 4.360E-05 | global batch size: 64 | lm loss: 6.441435E+00 | loss scale: 4096.0 | grad norm: 101578.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43005 |
+
time (ms)
|
43006 |
+
iteration 5663/ 159576 | consumed samples: 157680 | elapsed time per iteration (ms): 16497.3 | learning rate: 4.362E-05 | global batch size: 64 | lm loss: 6.405056E+00 | loss scale: 4096.0 | grad norm: 90814.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43007 |
+
time (ms)
|
43008 |
+
iteration 5664/ 159576 | consumed samples: 157744 | elapsed time per iteration (ms): 16393.8 | learning rate: 4.363E-05 | global batch size: 64 | lm loss: 6.437488E+00 | loss scale: 4096.0 | grad norm: 99258.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43009 |
+
time (ms)
|
43010 |
+
iteration 5665/ 159576 | consumed samples: 157808 | elapsed time per iteration (ms): 16464.8 | learning rate: 4.365E-05 | global batch size: 64 | lm loss: 6.461691E+00 | loss scale: 4096.0 | grad norm: 150615.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43011 |
+
time (ms)
|
43012 |
+
iteration 5666/ 159576 | consumed samples: 157872 | elapsed time per iteration (ms): 16442.6 | learning rate: 4.367E-05 | global batch size: 64 | lm loss: 6.379485E+00 | loss scale: 4096.0 | grad norm: 87553.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43013 |
+
time (ms)
|
43014 |
+
iteration 5667/ 159576 | consumed samples: 157936 | elapsed time per iteration (ms): 16408.0 | learning rate: 4.369E-05 | global batch size: 64 | lm loss: 6.436778E+00 | loss scale: 4096.0 | grad norm: 86837.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43015 |
+
time (ms)
|
43016 |
+
iteration 5668/ 159576 | consumed samples: 158000 | elapsed time per iteration (ms): 16382.6 | learning rate: 4.370E-05 | global batch size: 64 | lm loss: 6.456222E+00 | loss scale: 4096.0 | grad norm: 81561.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43017 |
+
time (ms)
|
43018 |
+
iteration 5669/ 159576 | consumed samples: 158064 | elapsed time per iteration (ms): 16606.9 | learning rate: 4.372E-05 | global batch size: 64 | lm loss: 6.394565E+00 | loss scale: 4096.0 | grad norm: 90655.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43019 |
+
time (ms)
|
43020 |
+
iteration 5670/ 159576 | consumed samples: 158128 | elapsed time per iteration (ms): 16482.0 | learning rate: 4.374E-05 | global batch size: 64 | lm loss: 6.388999E+00 | loss scale: 4096.0 | grad norm: 139861.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43021 |
+
time (ms)
|
43022 |
+
iteration 5671/ 159576 | consumed samples: 158192 | elapsed time per iteration (ms): 16430.2 | learning rate: 4.376E-05 | global batch size: 64 | lm loss: 6.348672E+00 | loss scale: 4096.0 | grad norm: 79933.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43023 |
+
time (ms)
|
43024 |
+
iteration 5672/ 159576 | consumed samples: 158256 | elapsed time per iteration (ms): 16343.5 | learning rate: 4.378E-05 | global batch size: 64 | lm loss: 6.358377E+00 | loss scale: 4096.0 | grad norm: 91907.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43025 |
+
time (ms)
|
43026 |
+
iteration 5673/ 159576 | consumed samples: 158320 | elapsed time per iteration (ms): 16738.6 | learning rate: 4.379E-05 | global batch size: 64 | lm loss: 6.397278E+00 | loss scale: 4096.0 | grad norm: 81347.015 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43027 |
+
time (ms)
|
43028 |
+
iteration 5674/ 159576 | consumed samples: 158384 | elapsed time per iteration (ms): 16377.1 | learning rate: 4.381E-05 | global batch size: 64 | lm loss: 6.330511E+00 | loss scale: 4096.0 | grad norm: 87623.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43029 |
+
time (ms)
|
43030 |
+
iteration 5675/ 159576 | consumed samples: 158448 | elapsed time per iteration (ms): 16376.8 | learning rate: 4.383E-05 | global batch size: 64 | lm loss: 6.400737E+00 | loss scale: 4096.0 | grad norm: 86243.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43031 |
+
time (ms)
|
43032 |
+
iteration 5676/ 159576 | consumed samples: 158512 | elapsed time per iteration (ms): 16407.2 | learning rate: 4.385E-05 | global batch size: 64 | lm loss: 6.373343E+00 | loss scale: 4096.0 | grad norm: 112233.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43033 |
+
time (ms)
|
43034 |
+
iteration 5677/ 159576 | consumed samples: 158576 | elapsed time per iteration (ms): 16504.3 | learning rate: 4.386E-05 | global batch size: 64 | lm loss: 6.340403E+00 | loss scale: 4096.0 | grad norm: 87545.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43035 |
+
time (ms)
|
43036 |
+
iteration 5678/ 159576 | consumed samples: 158640 | elapsed time per iteration (ms): 16469.6 | learning rate: 4.388E-05 | global batch size: 64 | lm loss: 6.483582E+00 | loss scale: 4096.0 | grad norm: 85898.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43037 |
+
time (ms)
|
43038 |
+
iteration 5679/ 159576 | consumed samples: 158704 | elapsed time per iteration (ms): 16363.2 | learning rate: 4.390E-05 | global batch size: 64 | lm loss: 6.384809E+00 | loss scale: 4096.0 | grad norm: 75822.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43039 |
+
time (ms)
|
43040 |
+
iteration 5680/ 159576 | consumed samples: 158768 | elapsed time per iteration (ms): 16705.5 | learning rate: 4.392E-05 | global batch size: 64 | lm loss: 6.360985E+00 | loss scale: 4096.0 | grad norm: 93411.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43041 |
+
time (ms)
|
43042 |
+
iteration 5681/ 159576 | consumed samples: 158832 | elapsed time per iteration (ms): 16533.6 | learning rate: 4.393E-05 | global batch size: 64 | lm loss: 6.346332E+00 | loss scale: 4096.0 | grad norm: 98347.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43043 |
+
time (ms)
|
43044 |
+
iteration 5682/ 159576 | consumed samples: 158896 | elapsed time per iteration (ms): 16424.8 | learning rate: 4.395E-05 | global batch size: 64 | lm loss: 6.452760E+00 | loss scale: 4096.0 | grad norm: 113842.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43045 |
+
time (ms)
|
43046 |
+
iteration 5683/ 159576 | consumed samples: 158960 | elapsed time per iteration (ms): 16412.1 | learning rate: 4.397E-05 | global batch size: 64 | lm loss: 6.394449E+00 | loss scale: 4096.0 | grad norm: 225192.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43047 |
+
time (ms)
|
43048 |
+
iteration 5684/ 159576 | consumed samples: 159024 | elapsed time per iteration (ms): 16934.4 | learning rate: 4.399E-05 | global batch size: 64 | lm loss: 6.394941E+00 | loss scale: 4096.0 | grad norm: 81396.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43049 |
+
time (ms)
|
43050 |
+
iteration 5685/ 159576 | consumed samples: 159088 | elapsed time per iteration (ms): 16454.0 | learning rate: 4.401E-05 | global batch size: 64 | lm loss: 6.261321E+00 | loss scale: 4096.0 | grad norm: 86149.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43051 |
+
time (ms)
|
43052 |
+
iteration 5686/ 159576 | consumed samples: 159152 | elapsed time per iteration (ms): 16431.5 | learning rate: 4.402E-05 | global batch size: 64 | lm loss: 6.492159E+00 | loss scale: 4096.0 | grad norm: 119300.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43053 |
+
time (ms)
|
43054 |
+
iteration 5687/ 159576 | consumed samples: 159216 | elapsed time per iteration (ms): 16386.6 | learning rate: 4.404E-05 | global batch size: 64 | lm loss: 6.511878E+00 | loss scale: 4096.0 | grad norm: 91338.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43055 |
+
time (ms)
|
43056 |
+
iteration 5688/ 159576 | consumed samples: 159280 | elapsed time per iteration (ms): 16584.3 | learning rate: 4.406E-05 | global batch size: 64 | lm loss: 6.442091E+00 | loss scale: 4096.0 | grad norm: 127329.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43057 |
+
time (ms)
|
43058 |
+
iteration 5689/ 159576 | consumed samples: 159344 | elapsed time per iteration (ms): 16414.9 | learning rate: 4.408E-05 | global batch size: 64 | lm loss: 6.445393E+00 | loss scale: 4096.0 | grad norm: 74818.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43059 |
+
time (ms)
|
43060 |
+
iteration 5690/ 159576 | consumed samples: 159408 | elapsed time per iteration (ms): 16438.8 | learning rate: 4.409E-05 | global batch size: 64 | lm loss: 6.349149E+00 | loss scale: 4096.0 | grad norm: 90721.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43061 |
+
time (ms)
|
43062 |
+
iteration 5691/ 159576 | consumed samples: 159472 | elapsed time per iteration (ms): 16762.3 | learning rate: 4.411E-05 | global batch size: 64 | lm loss: 6.450273E+00 | loss scale: 4096.0 | grad norm: 84948.864 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43063 |
+
time (ms)
|
43064 |
+
iteration 5692/ 159576 | consumed samples: 159536 | elapsed time per iteration (ms): 16461.8 | learning rate: 4.413E-05 | global batch size: 64 | lm loss: 6.451497E+00 | loss scale: 4096.0 | grad norm: 160376.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43065 |
+
time (ms)
|
43066 |
+
iteration 5693/ 159576 | consumed samples: 159600 | elapsed time per iteration (ms): 16376.8 | learning rate: 4.415E-05 | global batch size: 64 | lm loss: 6.414182E+00 | loss scale: 4096.0 | grad norm: 64931.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43067 |
+
time (ms)
|
43068 |
+
iteration 5694/ 159576 | consumed samples: 159664 | elapsed time per iteration (ms): 16448.9 | learning rate: 4.417E-05 | global batch size: 64 | lm loss: 6.392116E+00 | loss scale: 4096.0 | grad norm: 82604.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43069 |
+
time (ms)
|
43070 |
+
iteration 5695/ 159576 | consumed samples: 159728 | elapsed time per iteration (ms): 16621.3 | learning rate: 4.418E-05 | global batch size: 64 | lm loss: 6.379553E+00 | loss scale: 4096.0 | grad norm: 96286.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43071 |
+
time (ms)
|
43072 |
+
iteration 5696/ 159576 | consumed samples: 159792 | elapsed time per iteration (ms): 16447.4 | learning rate: 4.420E-05 | global batch size: 64 | lm loss: 6.319911E+00 | loss scale: 4096.0 | grad norm: 113489.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43073 |
+
time (ms)
|
43074 |
+
iteration 5697/ 159576 | consumed samples: 159856 | elapsed time per iteration (ms): 16402.2 | learning rate: 4.422E-05 | global batch size: 64 | lm loss: 6.538674E+00 | loss scale: 4096.0 | grad norm: 88380.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43075 |
+
time (ms)
|
43076 |
+
iteration 5698/ 159576 | consumed samples: 159920 | elapsed time per iteration (ms): 16462.3 | learning rate: 4.424E-05 | global batch size: 64 | lm loss: 6.450464E+00 | loss scale: 4096.0 | grad norm: 86213.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43077 |
+
time (ms)
|
43078 |
+
iteration 5699/ 159576 | consumed samples: 159984 | elapsed time per iteration (ms): 16600.9 | learning rate: 4.425E-05 | global batch size: 64 | lm loss: 6.360521E+00 | loss scale: 4096.0 | grad norm: 102839.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43079 |
+
time (ms)
|
43080 |
+
iteration 5700/ 159576 | consumed samples: 160048 | elapsed time per iteration (ms): 16472.4 | learning rate: 4.427E-05 | global batch size: 64 | lm loss: 6.408503E+00 | loss scale: 4096.0 | grad norm: 93708.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
43081 |
+
time (ms)
|
43082 |
+
[2021-09-25 01:07:57] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1165978_[1-10%1] on 'gpu_p13' partition)
|
43083 |
+
[2021-09-25 01:07:57] PULSE: tr8-104B is running for 19:15:46 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|