bigscience-bot
commited on
Commit
•
7abe266
1
Parent(s):
f68d57e
new data
Browse files- logs/main_log.txt +270 -0
logs/main_log.txt
CHANGED
@@ -22434,3 +22434,273 @@ time (ms)
|
|
22434 |
[2021-09-24 05:03:31] PULSE: tr8-104B is running for 1:02:26 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|
22435 |
iteration 733/ 159576 | consumed samples: 11728 | elapsed time per iteration (ms): 13725.2 | learning rate: 3.253E-06 | global batch size: 16 | lm loss: 7.808900E+00 | loss scale: 8192.0 | grad norm: 50692.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22436 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22434 |
[2021-09-24 05:03:31] PULSE: tr8-104B is running for 1:02:26 since 2021-09-24T04:01:05 (1162747 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|
22435 |
iteration 733/ 159576 | consumed samples: 11728 | elapsed time per iteration (ms): 13725.2 | learning rate: 3.253E-06 | global batch size: 16 | lm loss: 7.808900E+00 | loss scale: 8192.0 | grad norm: 50692.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22436 |
time (ms)
|
22437 |
+
iteration 734/ 159576 | consumed samples: 11744 | elapsed time per iteration (ms): 13115.2 | learning rate: 3.257E-06 | global batch size: 16 | lm loss: 7.737029E+00 | loss scale: 8192.0 | grad norm: 69149.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22438 |
+
time (ms)
|
22439 |
+
iteration 735/ 159576 | consumed samples: 11760 | elapsed time per iteration (ms): 13493.9 | learning rate: 3.262E-06 | global batch size: 16 | lm loss: 7.630354E+00 | loss scale: 8192.0 | grad norm: 85240.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22440 |
+
time (ms)
|
22441 |
+
iteration 736/ 159576 | consumed samples: 11776 | elapsed time per iteration (ms): 13636.0 | learning rate: 3.266E-06 | global batch size: 16 | lm loss: 7.626644E+00 | loss scale: 8192.0 | grad norm: 57646.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22442 |
+
time (ms)
|
22443 |
+
iteration 737/ 159576 | consumed samples: 11792 | elapsed time per iteration (ms): 13810.1 | learning rate: 3.271E-06 | global batch size: 16 | lm loss: 7.526936E+00 | loss scale: 8192.0 | grad norm: 95065.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22444 |
+
time (ms)
|
22445 |
+
iteration 738/ 159576 | consumed samples: 11808 | elapsed time per iteration (ms): 13385.6 | learning rate: 3.275E-06 | global batch size: 16 | lm loss: 7.820796E+00 | loss scale: 8192.0 | grad norm: 113407.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22446 |
+
time (ms)
|
22447 |
+
iteration 739/ 159576 | consumed samples: 11824 | elapsed time per iteration (ms): 13689.8 | learning rate: 3.280E-06 | global batch size: 16 | lm loss: 7.774467E+00 | loss scale: 8192.0 | grad norm: 98657.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22448 |
+
time (ms)
|
22449 |
+
iteration 740/ 159576 | consumed samples: 11840 | elapsed time per iteration (ms): 13965.2 | learning rate: 3.284E-06 | global batch size: 16 | lm loss: 7.762564E+00 | loss scale: 8192.0 | grad norm: 71745.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22450 |
+
time (ms)
|
22451 |
+
iteration 741/ 159576 | consumed samples: 11856 | elapsed time per iteration (ms): 13569.2 | learning rate: 3.288E-06 | global batch size: 16 | lm loss: 7.608281E+00 | loss scale: 8192.0 | grad norm: 40905.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22452 |
+
time (ms)
|
22453 |
+
iteration 742/ 159576 | consumed samples: 11872 | elapsed time per iteration (ms): 13635.8 | learning rate: 3.293E-06 | global batch size: 16 | lm loss: 7.570668E+00 | loss scale: 8192.0 | grad norm: 80257.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22454 |
+
time (ms)
|
22455 |
+
iteration 743/ 159576 | consumed samples: 11888 | elapsed time per iteration (ms): 13669.8 | learning rate: 3.297E-06 | global batch size: 16 | lm loss: 7.586653E+00 | loss scale: 8192.0 | grad norm: 56412.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22456 |
+
time (ms)
|
22457 |
+
iteration 744/ 159576 | consumed samples: 11904 | elapsed time per iteration (ms): 13473.9 | learning rate: 3.302E-06 | global batch size: 16 | lm loss: 7.701398E+00 | loss scale: 8192.0 | grad norm: 100221.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22458 |
+
time (ms)
|
22459 |
+
iteration 745/ 159576 | consumed samples: 11920 | elapsed time per iteration (ms): 13453.8 | learning rate: 3.306E-06 | global batch size: 16 | lm loss: 7.772648E+00 | loss scale: 8192.0 | grad norm: 88519.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22460 |
+
time (ms)
|
22461 |
+
iteration 746/ 159576 | consumed samples: 11936 | elapsed time per iteration (ms): 13732.5 | learning rate: 3.311E-06 | global batch size: 16 | lm loss: 7.940891E+00 | loss scale: 8192.0 | grad norm: 66980.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22462 |
+
time (ms)
|
22463 |
+
iteration 747/ 159576 | consumed samples: 11952 | elapsed time per iteration (ms): 13956.5 | learning rate: 3.315E-06 | global batch size: 16 | lm loss: 7.879022E+00 | loss scale: 8192.0 | grad norm: 73008.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22464 |
+
time (ms)
|
22465 |
+
iteration 748/ 159576 | consumed samples: 11968 | elapsed time per iteration (ms): 13250.5 | learning rate: 3.320E-06 | global batch size: 16 | lm loss: 7.693480E+00 | loss scale: 8192.0 | grad norm: 45346.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22466 |
+
time (ms)
|
22467 |
+
iteration 749/ 159576 | consumed samples: 11984 | elapsed time per iteration (ms): 13529.3 | learning rate: 3.324E-06 | global batch size: 16 | lm loss: 7.658270E+00 | loss scale: 8192.0 | grad norm: 156261.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22468 |
+
time (ms)
|
22469 |
+
iteration 750/ 159576 | consumed samples: 12000 | elapsed time per iteration (ms): 14110.0 | learning rate: 3.328E-06 | global batch size: 16 | lm loss: 7.741945E+00 | loss scale: 8192.0 | grad norm: 121818.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22470 |
+
time (ms)
|
22471 |
+
iteration 751/ 159576 | consumed samples: 12016 | elapsed time per iteration (ms): 13463.3 | learning rate: 3.333E-06 | global batch size: 16 | lm loss: 7.631550E+00 | loss scale: 8192.0 | grad norm: 69835.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22472 |
+
time (ms)
|
22473 |
+
iteration 752/ 159576 | consumed samples: 12032 | elapsed time per iteration (ms): 13424.2 | learning rate: 3.337E-06 | global batch size: 16 | lm loss: 7.669878E+00 | loss scale: 8192.0 | grad norm: 47821.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22474 |
+
time (ms)
|
22475 |
+
iteration 753/ 159576 | consumed samples: 12048 | elapsed time per iteration (ms): 13566.2 | learning rate: 3.342E-06 | global batch size: 16 | lm loss: 7.567214E+00 | loss scale: 8192.0 | grad norm: 68234.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22476 |
+
time (ms)
|
22477 |
+
iteration 754/ 159576 | consumed samples: 12064 | elapsed time per iteration (ms): 14065.3 | learning rate: 3.346E-06 | global batch size: 16 | lm loss: 7.753268E+00 | loss scale: 8192.0 | grad norm: 134900.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22478 |
+
time (ms)
|
22479 |
+
iteration 755/ 159576 | consumed samples: 12080 | elapsed time per iteration (ms): 13518.6 | learning rate: 3.351E-06 | global batch size: 16 | lm loss: 7.552173E+00 | loss scale: 8192.0 | grad norm: 48964.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22480 |
+
time (ms)
|
22481 |
+
iteration 756/ 159576 | consumed samples: 12096 | elapsed time per iteration (ms): 13728.7 | learning rate: 3.355E-06 | global batch size: 16 | lm loss: 7.735795E+00 | loss scale: 8192.0 | grad norm: 73204.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22482 |
+
time (ms)
|
22483 |
+
iteration 757/ 159576 | consumed samples: 12112 | elapsed time per iteration (ms): 14082.3 | learning rate: 3.359E-06 | global batch size: 16 | lm loss: 7.910018E+00 | loss scale: 8192.0 | grad norm: 83429.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22484 |
+
time (ms)
|
22485 |
+
iteration 758/ 159576 | consumed samples: 12128 | elapsed time per iteration (ms): 13428.5 | learning rate: 3.364E-06 | global batch size: 16 | lm loss: 7.669195E+00 | loss scale: 8192.0 | grad norm: 61137.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22486 |
+
time (ms)
|
22487 |
+
iteration 759/ 159576 | consumed samples: 12144 | elapsed time per iteration (ms): 13632.1 | learning rate: 3.368E-06 | global batch size: 16 | lm loss: 7.795278E+00 | loss scale: 8192.0 | grad norm: 59141.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22488 |
+
time (ms)
|
22489 |
+
iteration 760/ 159576 | consumed samples: 12160 | elapsed time per iteration (ms): 13624.6 | learning rate: 3.373E-06 | global batch size: 16 | lm loss: 7.692988E+00 | loss scale: 8192.0 | grad norm: 104447.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22490 |
+
time (ms)
|
22491 |
+
iteration 761/ 159576 | consumed samples: 12176 | elapsed time per iteration (ms): 13611.0 | learning rate: 3.377E-06 | global batch size: 16 | lm loss: 7.784515E+00 | loss scale: 8192.0 | grad norm: 51368.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22492 |
+
time (ms)
|
22493 |
+
iteration 762/ 159576 | consumed samples: 12192 | elapsed time per iteration (ms): 13558.6 | learning rate: 3.382E-06 | global batch size: 16 | lm loss: 7.582584E+00 | loss scale: 8192.0 | grad norm: 61983.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22494 |
+
time (ms)
|
22495 |
+
iteration 763/ 159576 | consumed samples: 12208 | elapsed time per iteration (ms): 13793.4 | learning rate: 3.386E-06 | global batch size: 16 | lm loss: 7.743572E+00 | loss scale: 8192.0 | grad norm: 56837.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22496 |
+
time (ms)
|
22497 |
+
iteration 764/ 159576 | consumed samples: 12224 | elapsed time per iteration (ms): 13743.7 | learning rate: 3.391E-06 | global batch size: 16 | lm loss: 7.701952E+00 | loss scale: 8192.0 | grad norm: 92476.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22498 |
+
time (ms)
|
22499 |
+
iteration 765/ 159576 | consumed samples: 12240 | elapsed time per iteration (ms): 13529.8 | learning rate: 3.395E-06 | global batch size: 16 | lm loss: 7.691103E+00 | loss scale: 8192.0 | grad norm: 103276.953 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22500 |
+
time (ms)
|
22501 |
+
iteration 766/ 159576 | consumed samples: 12256 | elapsed time per iteration (ms): 13189.2 | learning rate: 3.399E-06 | global batch size: 16 | lm loss: 7.589336E+00 | loss scale: 8192.0 | grad norm: 54735.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22502 |
+
time (ms)
|
22503 |
+
iteration 767/ 159576 | consumed samples: 12272 | elapsed time per iteration (ms): 13483.6 | learning rate: 3.404E-06 | global batch size: 16 | lm loss: 7.717595E+00 | loss scale: 8192.0 | grad norm: 54456.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22504 |
+
time (ms)
|
22505 |
+
iteration 768/ 159576 | consumed samples: 12288 | elapsed time per iteration (ms): 13780.9 | learning rate: 3.408E-06 | global batch size: 16 | lm loss: 7.852913E+00 | loss scale: 8192.0 | grad norm: 88912.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22506 |
+
time (ms)
|
22507 |
+
iteration 769/ 159576 | consumed samples: 12304 | elapsed time per iteration (ms): 13724.3 | learning rate: 3.413E-06 | global batch size: 16 | lm loss: 7.716819E+00 | loss scale: 8192.0 | grad norm: 102833.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22508 |
+
time (ms)
|
22509 |
+
iteration 770/ 159576 | consumed samples: 12320 | elapsed time per iteration (ms): 13377.3 | learning rate: 3.417E-06 | global batch size: 16 | lm loss: 7.597641E+00 | loss scale: 8192.0 | grad norm: 50835.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22510 |
+
time (ms)
|
22511 |
+
iteration 771/ 159576 | consumed samples: 12336 | elapsed time per iteration (ms): 13692.5 | learning rate: 3.422E-06 | global batch size: 16 | lm loss: 7.478999E+00 | loss scale: 8192.0 | grad norm: 53587.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22512 |
+
time (ms)
|
22513 |
+
iteration 772/ 159576 | consumed samples: 12352 | elapsed time per iteration (ms): 14180.5 | learning rate: 3.426E-06 | global batch size: 16 | lm loss: 7.546258E+00 | loss scale: 8192.0 | grad norm: 63294.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22514 |
+
time (ms)
|
22515 |
+
iteration 773/ 159576 | consumed samples: 12368 | elapsed time per iteration (ms): 13096.5 | learning rate: 3.430E-06 | global batch size: 16 | lm loss: 7.711743E+00 | loss scale: 8192.0 | grad norm: 99934.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22516 |
+
time (ms)
|
22517 |
+
iteration 774/ 159576 | consumed samples: 12384 | elapsed time per iteration (ms): 13520.5 | learning rate: 3.435E-06 | global batch size: 16 | lm loss: 7.645664E+00 | loss scale: 8192.0 | grad norm: 56458.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22518 |
+
time (ms)
|
22519 |
+
iteration 775/ 159576 | consumed samples: 12400 | elapsed time per iteration (ms): 13630.5 | learning rate: 3.439E-06 | global batch size: 16 | lm loss: 7.603559E+00 | loss scale: 8192.0 | grad norm: 46450.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22520 |
+
time (ms)
|
22521 |
+
iteration 776/ 159576 | consumed samples: 12416 | elapsed time per iteration (ms): 14027.6 | learning rate: 3.444E-06 | global batch size: 16 | lm loss: 7.737686E+00 | loss scale: 8192.0 | grad norm: 141770.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22522 |
+
time (ms)
|
22523 |
+
iteration 777/ 159576 | consumed samples: 12432 | elapsed time per iteration (ms): 13425.6 | learning rate: 3.448E-06 | global batch size: 16 | lm loss: 7.584914E+00 | loss scale: 8192.0 | grad norm: 124071.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22524 |
+
time (ms)
|
22525 |
+
iteration 778/ 159576 | consumed samples: 12448 | elapsed time per iteration (ms): 13642.7 | learning rate: 3.453E-06 | global batch size: 16 | lm loss: 7.606685E+00 | loss scale: 8192.0 | grad norm: 53139.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22526 |
+
time (ms)
|
22527 |
+
iteration 779/ 159576 | consumed samples: 12464 | elapsed time per iteration (ms): 13834.1 | learning rate: 3.457E-06 | global batch size: 16 | lm loss: 7.786515E+00 | loss scale: 8192.0 | grad norm: 58657.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22528 |
+
time (ms)
|
22529 |
+
iteration 780/ 159576 | consumed samples: 12480 | elapsed time per iteration (ms): 13091.5 | learning rate: 3.462E-06 | global batch size: 16 | lm loss: 7.618142E+00 | loss scale: 8192.0 | grad norm: 37881.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22530 |
+
time (ms)
|
22531 |
+
iteration 781/ 159576 | consumed samples: 12496 | elapsed time per iteration (ms): 14146.0 | learning rate: 3.466E-06 | global batch size: 16 | lm loss: 7.906812E+00 | loss scale: 8192.0 | grad norm: 114163.942 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22532 |
+
time (ms)
|
22533 |
+
iteration 782/ 159576 | consumed samples: 12512 | elapsed time per iteration (ms): 14025.7 | learning rate: 3.470E-06 | global batch size: 16 | lm loss: 7.566094E+00 | loss scale: 8192.0 | grad norm: 46220.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22534 |
+
time (ms)
|
22535 |
+
iteration 783/ 159576 | consumed samples: 12528 | elapsed time per iteration (ms): 13895.4 | learning rate: 3.475E-06 | global batch size: 16 | lm loss: 7.630446E+00 | loss scale: 8192.0 | grad norm: 64319.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22536 |
+
time (ms)
|
22537 |
+
iteration 784/ 159576 | consumed samples: 12544 | elapsed time per iteration (ms): 13890.1 | learning rate: 3.479E-06 | global batch size: 16 | lm loss: 7.692337E+00 | loss scale: 8192.0 | grad norm: 48575.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22538 |
+
time (ms)
|
22539 |
+
iteration 785/ 159576 | consumed samples: 12560 | elapsed time per iteration (ms): 14156.1 | learning rate: 3.484E-06 | global batch size: 16 | lm loss: 7.736514E+00 | loss scale: 8192.0 | grad norm: 90651.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22540 |
+
time (ms)
|
22541 |
+
iteration 786/ 159576 | consumed samples: 12576 | elapsed time per iteration (ms): 14206.7 | learning rate: 3.488E-06 | global batch size: 16 | lm loss: 7.744794E+00 | loss scale: 8192.0 | grad norm: 84355.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22542 |
+
time (ms)
|
22543 |
+
iteration 787/ 159576 | consumed samples: 12592 | elapsed time per iteration (ms): 13622.2 | learning rate: 3.493E-06 | global batch size: 16 | lm loss: 7.672806E+00 | loss scale: 8192.0 | grad norm: 51705.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22544 |
+
time (ms)
|
22545 |
+
iteration 788/ 159576 | consumed samples: 12608 | elapsed time per iteration (ms): 13771.2 | learning rate: 3.497E-06 | global batch size: 16 | lm loss: 7.713612E+00 | loss scale: 8192.0 | grad norm: 50748.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22546 |
+
time (ms)
|
22547 |
+
iteration 789/ 159576 | consumed samples: 12624 | elapsed time per iteration (ms): 14226.1 | learning rate: 3.501E-06 | global batch size: 16 | lm loss: 7.630927E+00 | loss scale: 8192.0 | grad norm: 68226.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22548 |
+
time (ms)
|
22549 |
+
iteration 790/ 159576 | consumed samples: 12640 | elapsed time per iteration (ms): 14175.2 | learning rate: 3.506E-06 | global batch size: 16 | lm loss: 7.523444E+00 | loss scale: 8192.0 | grad norm: 67731.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22550 |
+
time (ms)
|
22551 |
+
iteration 791/ 159576 | consumed samples: 12656 | elapsed time per iteration (ms): 13844.2 | learning rate: 3.510E-06 | global batch size: 16 | lm loss: 7.357096E+00 | loss scale: 8192.0 | grad norm: 45569.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22552 |
+
time (ms)
|
22553 |
+
iteration 792/ 159576 | consumed samples: 12672 | elapsed time per iteration (ms): 13884.3 | learning rate: 3.515E-06 | global batch size: 16 | lm loss: 7.701885E+00 | loss scale: 8192.0 | grad norm: 53017.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22554 |
+
time (ms)
|
22555 |
+
iteration 793/ 159576 | consumed samples: 12688 | elapsed time per iteration (ms): 14159.9 | learning rate: 3.519E-06 | global batch size: 16 | lm loss: 7.529918E+00 | loss scale: 8192.0 | grad norm: 55466.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22556 |
+
time (ms)
|
22557 |
+
iteration 794/ 159576 | consumed samples: 12704 | elapsed time per iteration (ms): 13975.0 | learning rate: 3.524E-06 | global batch size: 16 | lm loss: 7.684763E+00 | loss scale: 8192.0 | grad norm: 44801.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22558 |
+
time (ms)
|
22559 |
+
iteration 795/ 159576 | consumed samples: 12720 | elapsed time per iteration (ms): 13769.3 | learning rate: 3.528E-06 | global batch size: 16 | lm loss: 7.843237E+00 | loss scale: 8192.0 | grad norm: 59761.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22560 |
+
time (ms)
|
22561 |
+
iteration 796/ 159576 | consumed samples: 12736 | elapsed time per iteration (ms): 13954.1 | learning rate: 3.533E-06 | global batch size: 16 | lm loss: 7.737316E+00 | loss scale: 8192.0 | grad norm: 66240.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22562 |
+
time (ms)
|
22563 |
+
iteration 797/ 159576 | consumed samples: 12752 | elapsed time per iteration (ms): 13982.4 | learning rate: 3.537E-06 | global batch size: 16 | lm loss: 7.712746E+00 | loss scale: 8192.0 | grad norm: 53315.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22564 |
+
time (ms)
|
22565 |
+
iteration 798/ 159576 | consumed samples: 12768 | elapsed time per iteration (ms): 14164.1 | learning rate: 3.541E-06 | global batch size: 16 | lm loss: 7.649867E+00 | loss scale: 8192.0 | grad norm: 46451.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22566 |
+
time (ms)
|
22567 |
+
iteration 799/ 159576 | consumed samples: 12784 | elapsed time per iteration (ms): 14010.0 | learning rate: 3.546E-06 | global batch size: 16 | lm loss: 7.833376E+00 | loss scale: 8192.0 | grad norm: 65829.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22568 |
+
time (ms)
|
22569 |
+
iteration 800/ 159576 | consumed samples: 12800 | elapsed time per iteration (ms): 14307.9 | learning rate: 3.550E-06 | global batch size: 16 | lm loss: 7.790625E+00 | loss scale: 8192.0 | grad norm: 71968.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22570 |
+
time (ms)
|
22571 |
+
iteration 801/ 159576 | consumed samples: 12816 | elapsed time per iteration (ms): 13972.6 | learning rate: 3.555E-06 | global batch size: 16 | lm loss: 7.611866E+00 | loss scale: 8192.0 | grad norm: 48597.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22572 |
+
time (ms)
|
22573 |
+
iteration 802/ 159576 | consumed samples: 12832 | elapsed time per iteration (ms): 13959.0 | learning rate: 3.559E-06 | global batch size: 16 | lm loss: 7.617666E+00 | loss scale: 8192.0 | grad norm: 147672.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22574 |
+
time (ms)
|
22575 |
+
iteration 803/ 159576 | consumed samples: 12848 | elapsed time per iteration (ms): 13806.4 | learning rate: 3.564E-06 | global batch size: 16 | lm loss: 7.813154E+00 | loss scale: 8192.0 | grad norm: 121980.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22576 |
+
time (ms)
|
22577 |
+
iteration 804/ 159576 | consumed samples: 12864 | elapsed time per iteration (ms): 13949.2 | learning rate: 3.568E-06 | global batch size: 16 | lm loss: 7.654176E+00 | loss scale: 8192.0 | grad norm: 52351.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22578 |
+
time (ms)
|
22579 |
+
iteration 805/ 159576 | consumed samples: 12880 | elapsed time per iteration (ms): 13801.9 | learning rate: 3.572E-06 | global batch size: 16 | lm loss: 7.564305E+00 | loss scale: 8192.0 | grad norm: 62792.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22580 |
+
time (ms)
|
22581 |
+
iteration 806/ 159576 | consumed samples: 12896 | elapsed time per iteration (ms): 13954.3 | learning rate: 3.577E-06 | global batch size: 16 | lm loss: 7.707185E+00 | loss scale: 8192.0 | grad norm: 64767.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22582 |
+
time (ms)
|
22583 |
+
iteration 807/ 159576 | consumed samples: 12912 | elapsed time per iteration (ms): 14250.4 | learning rate: 3.581E-06 | global batch size: 16 | lm loss: 7.578569E+00 | loss scale: 8192.0 | grad norm: 73926.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22584 |
+
time (ms)
|
22585 |
+
iteration 808/ 159576 | consumed samples: 12928 | elapsed time per iteration (ms): 14201.0 | learning rate: 3.586E-06 | global batch size: 16 | lm loss: 7.631069E+00 | loss scale: 8192.0 | grad norm: 110069.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22586 |
+
time (ms)
|
22587 |
+
iteration 809/ 159576 | consumed samples: 12944 | elapsed time per iteration (ms): 13598.4 | learning rate: 3.590E-06 | global batch size: 16 | lm loss: 7.628491E+00 | loss scale: 8192.0 | grad norm: 49670.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22588 |
+
time (ms)
|
22589 |
+
iteration 810/ 159576 | consumed samples: 12960 | elapsed time per iteration (ms): 13941.6 | learning rate: 3.595E-06 | global batch size: 16 | lm loss: 7.759563E+00 | loss scale: 8192.0 | grad norm: 45971.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22590 |
+
time (ms)
|
22591 |
+
iteration 811/ 159576 | consumed samples: 12976 | elapsed time per iteration (ms): 14298.0 | learning rate: 3.599E-06 | global batch size: 16 | lm loss: 7.502759E+00 | loss scale: 8192.0 | grad norm: 77602.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22592 |
+
time (ms)
|
22593 |
+
iteration 812/ 159576 | consumed samples: 12992 | elapsed time per iteration (ms): 13416.1 | learning rate: 3.604E-06 | global batch size: 16 | lm loss: 7.624804E+00 | loss scale: 8192.0 | grad norm: 95989.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22594 |
+
time (ms)
|
22595 |
+
iteration 813/ 159576 | consumed samples: 13008 | elapsed time per iteration (ms): 13579.1 | learning rate: 3.608E-06 | global batch size: 16 | lm loss: 7.542982E+00 | loss scale: 8192.0 | grad norm: 52064.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22596 |
+
time (ms)
|
22597 |
+
iteration 814/ 159576 | consumed samples: 13024 | elapsed time per iteration (ms): 14100.2 | learning rate: 3.612E-06 | global batch size: 16 | lm loss: 7.676429E+00 | loss scale: 8192.0 | grad norm: 38221.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22598 |
+
time (ms)
|
22599 |
+
iteration 815/ 159576 | consumed samples: 13040 | elapsed time per iteration (ms): 14346.2 | learning rate: 3.617E-06 | global batch size: 16 | lm loss: 7.695131E+00 | loss scale: 8192.0 | grad norm: 57869.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22600 |
+
time (ms)
|
22601 |
+
iteration 816/ 159576 | consumed samples: 13056 | elapsed time per iteration (ms): 13771.7 | learning rate: 3.621E-06 | global batch size: 16 | lm loss: 7.578337E+00 | loss scale: 8192.0 | grad norm: 49771.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22602 |
+
time (ms)
|
22603 |
+
iteration 817/ 159576 | consumed samples: 13072 | elapsed time per iteration (ms): 13776.0 | learning rate: 3.626E-06 | global batch size: 16 | lm loss: 7.583301E+00 | loss scale: 8192.0 | grad norm: 46160.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22604 |
+
time (ms)
|
22605 |
+
iteration 818/ 159576 | consumed samples: 13088 | elapsed time per iteration (ms): 14040.8 | learning rate: 3.630E-06 | global batch size: 16 | lm loss: 7.773385E+00 | loss scale: 8192.0 | grad norm: 42207.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22606 |
+
time (ms)
|
22607 |
+
iteration 819/ 159576 | consumed samples: 13104 | elapsed time per iteration (ms): 13835.3 | learning rate: 3.635E-06 | global batch size: 16 | lm loss: 7.905573E+00 | loss scale: 8192.0 | grad norm: 111883.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22608 |
+
time (ms)
|
22609 |
+
iteration 820/ 159576 | consumed samples: 13120 | elapsed time per iteration (ms): 13924.4 | learning rate: 3.639E-06 | global batch size: 16 | lm loss: 7.730550E+00 | loss scale: 8192.0 | grad norm: 75433.173 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22610 |
+
time (ms)
|
22611 |
+
iteration 821/ 159576 | consumed samples: 13136 | elapsed time per iteration (ms): 13915.0 | learning rate: 3.643E-06 | global batch size: 16 | lm loss: 7.688564E+00 | loss scale: 8192.0 | grad norm: 41927.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22612 |
+
time (ms)
|
22613 |
+
iteration 822/ 159576 | consumed samples: 13152 | elapsed time per iteration (ms): 13890.4 | learning rate: 3.648E-06 | global batch size: 16 | lm loss: 7.552343E+00 | loss scale: 8192.0 | grad norm: 96543.909 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22614 |
+
time (ms)
|
22615 |
+
iteration 823/ 159576 | consumed samples: 13168 | elapsed time per iteration (ms): 13560.6 | learning rate: 3.652E-06 | global batch size: 16 | lm loss: 7.617982E+00 | loss scale: 8192.0 | grad norm: 56370.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22616 |
+
time (ms)
|
22617 |
+
iteration 824/ 159576 | consumed samples: 13184 | elapsed time per iteration (ms): 14024.1 | learning rate: 3.657E-06 | global batch size: 16 | lm loss: 7.600199E+00 | loss scale: 8192.0 | grad norm: 61928.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22618 |
+
time (ms)
|
22619 |
+
iteration 825/ 159576 | consumed samples: 13200 | elapsed time per iteration (ms): 14003.2 | learning rate: 3.661E-06 | global batch size: 16 | lm loss: 7.541789E+00 | loss scale: 8192.0 | grad norm: 56863.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22620 |
+
time (ms)
|
22621 |
+
iteration 826/ 159576 | consumed samples: 13216 | elapsed time per iteration (ms): 13848.3 | learning rate: 3.666E-06 | global batch size: 16 | lm loss: 7.782004E+00 | loss scale: 8192.0 | grad norm: 59985.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22622 |
+
time (ms)
|
22623 |
+
iteration 827/ 159576 | consumed samples: 13232 | elapsed time per iteration (ms): 13902.1 | learning rate: 3.670E-06 | global batch size: 16 | lm loss: 7.733065E+00 | loss scale: 8192.0 | grad norm: 39148.960 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22624 |
+
time (ms)
|
22625 |
+
iteration 828/ 159576 | consumed samples: 13248 | elapsed time per iteration (ms): 14356.1 | learning rate: 3.675E-06 | global batch size: 16 | lm loss: 7.625387E+00 | loss scale: 8192.0 | grad norm: 56612.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22626 |
+
time (ms)
|
22627 |
+
iteration 829/ 159576 | consumed samples: 13264 | elapsed time per iteration (ms): 14368.0 | learning rate: 3.679E-06 | global batch size: 16 | lm loss: 7.759684E+00 | loss scale: 8192.0 | grad norm: 67635.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22628 |
+
time (ms)
|
22629 |
+
iteration 830/ 159576 | consumed samples: 13280 | elapsed time per iteration (ms): 13627.9 | learning rate: 3.683E-06 | global batch size: 16 | lm loss: 7.694915E+00 | loss scale: 8192.0 | grad norm: 60776.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22630 |
+
time (ms)
|
22631 |
+
iteration 831/ 159576 | consumed samples: 13296 | elapsed time per iteration (ms): 13498.1 | learning rate: 3.688E-06 | global batch size: 16 | lm loss: 7.492978E+00 | loss scale: 8192.0 | grad norm: 42000.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22632 |
+
time (ms)
|
22633 |
+
iteration 832/ 159576 | consumed samples: 13312 | elapsed time per iteration (ms): 13938.9 | learning rate: 3.692E-06 | global batch size: 16 | lm loss: 7.616700E+00 | loss scale: 8192.0 | grad norm: 105579.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22634 |
+
time (ms)
|
22635 |
+
iteration 833/ 159576 | consumed samples: 13328 | elapsed time per iteration (ms): 13687.8 | learning rate: 3.697E-06 | global batch size: 16 | lm loss: 7.715961E+00 | loss scale: 8192.0 | grad norm: 78119.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22636 |
+
time (ms)
|
22637 |
+
iteration 834/ 159576 | consumed samples: 13344 | elapsed time per iteration (ms): 13717.8 | learning rate: 3.701E-06 | global batch size: 16 | lm loss: 7.778497E+00 | loss scale: 8192.0 | grad norm: 58326.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22638 |
+
time (ms)
|
22639 |
+
iteration 835/ 159576 | consumed samples: 13360 | elapsed time per iteration (ms): 13913.9 | learning rate: 3.706E-06 | global batch size: 16 | lm loss: 7.718093E+00 | loss scale: 8192.0 | grad norm: 48122.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22640 |
+
time (ms)
|
22641 |
+
iteration 836/ 159576 | consumed samples: 13376 | elapsed time per iteration (ms): 14318.5 | learning rate: 3.710E-06 | global batch size: 16 | lm loss: 7.521303E+00 | loss scale: 8192.0 | grad norm: 60082.150 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22642 |
+
time (ms)
|
22643 |
+
iteration 837/ 159576 | consumed samples: 13392 | elapsed time per iteration (ms): 13780.0 | learning rate: 3.714E-06 | global batch size: 16 | lm loss: 7.538383E+00 | loss scale: 8192.0 | grad norm: 61043.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22644 |
+
time (ms)
|
22645 |
+
iteration 838/ 159576 | consumed samples: 13408 | elapsed time per iteration (ms): 13961.2 | learning rate: 3.719E-06 | global batch size: 16 | lm loss: 7.548276E+00 | loss scale: 8192.0 | grad norm: 58423.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22646 |
+
time (ms)
|
22647 |
+
iteration 839/ 159576 | consumed samples: 13424 | elapsed time per iteration (ms): 14239.6 | learning rate: 3.723E-06 | global batch size: 16 | lm loss: 7.618182E+00 | loss scale: 8192.0 | grad norm: 48500.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22648 |
+
time (ms)
|
22649 |
+
iteration 840/ 159576 | consumed samples: 13440 | elapsed time per iteration (ms): 13752.3 | learning rate: 3.728E-06 | global batch size: 16 | lm loss: 7.595082E+00 | loss scale: 8192.0 | grad norm: 50825.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22650 |
+
time (ms)
|
22651 |
+
iteration 841/ 159576 | consumed samples: 13456 | elapsed time per iteration (ms): 14199.3 | learning rate: 3.732E-06 | global batch size: 16 | lm loss: 7.492725E+00 | loss scale: 8192.0 | grad norm: 56977.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22652 |
+
time (ms)
|
22653 |
+
iteration 842/ 159576 | consumed samples: 13472 | elapsed time per iteration (ms): 13925.4 | learning rate: 3.737E-06 | global batch size: 16 | lm loss: 7.783816E+00 | loss scale: 8192.0 | grad norm: 40797.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22654 |
+
time (ms)
|
22655 |
+
iteration 843/ 159576 | consumed samples: 13488 | elapsed time per iteration (ms): 14119.4 | learning rate: 3.741E-06 | global batch size: 16 | lm loss: 7.606951E+00 | loss scale: 8192.0 | grad norm: 50890.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22656 |
+
time (ms)
|
22657 |
+
iteration 844/ 159576 | consumed samples: 13504 | elapsed time per iteration (ms): 13941.8 | learning rate: 3.746E-06 | global batch size: 16 | lm loss: 7.638199E+00 | loss scale: 8192.0 | grad norm: 52652.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22658 |
+
time (ms)
|
22659 |
+
iteration 845/ 159576 | consumed samples: 13520 | elapsed time per iteration (ms): 14424.1 | learning rate: 3.750E-06 | global batch size: 16 | lm loss: 7.555171E+00 | loss scale: 8192.0 | grad norm: 48298.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22660 |
+
time (ms)
|
22661 |
+
iteration 846/ 159576 | consumed samples: 13536 | elapsed time per iteration (ms): 14202.9 | learning rate: 3.754E-06 | global batch size: 16 | lm loss: 7.651504E+00 | loss scale: 8192.0 | grad norm: 76618.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22662 |
+
time (ms)
|
22663 |
+
iteration 847/ 159576 | consumed samples: 13552 | elapsed time per iteration (ms): 13785.9 | learning rate: 3.759E-06 | global batch size: 16 | lm loss: 7.914087E+00 | loss scale: 8192.0 | grad norm: 40970.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22664 |
+
time (ms)
|
22665 |
+
iteration 848/ 159576 | consumed samples: 13568 | elapsed time per iteration (ms): 13892.7 | learning rate: 3.763E-06 | global batch size: 16 | lm loss: 7.714731E+00 | loss scale: 8192.0 | grad norm: 47666.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22666 |
+
time (ms)
|
22667 |
+
iteration 849/ 159576 | consumed samples: 13584 | elapsed time per iteration (ms): 13608.6 | learning rate: 3.768E-06 | global batch size: 16 | lm loss: 7.566309E+00 | loss scale: 8192.0 | grad norm: 56337.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22668 |
+
time (ms)
|
22669 |
+
iteration 850/ 159576 | consumed samples: 13600 | elapsed time per iteration (ms): 13752.1 | learning rate: 3.772E-06 | global batch size: 16 | lm loss: 7.621016E+00 | loss scale: 8192.0 | grad norm: 55695.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22670 |
+
time (ms)
|
22671 |
+
iteration 851/ 159576 | consumed samples: 13616 | elapsed time per iteration (ms): 13514.6 | learning rate: 3.777E-06 | global batch size: 16 | lm loss: 7.510153E+00 | loss scale: 8192.0 | grad norm: 70852.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22672 |
+
time (ms)
|
22673 |
+
iteration 852/ 159576 | consumed samples: 13632 | elapsed time per iteration (ms): 13536.1 | learning rate: 3.781E-06 | global batch size: 16 | lm loss: 7.417966E+00 | loss scale: 8192.0 | grad norm: 43169.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22674 |
+
time (ms)
|
22675 |
+
iteration 853/ 159576 | consumed samples: 13648 | elapsed time per iteration (ms): 14116.4 | learning rate: 3.786E-06 | global batch size: 16 | lm loss: 7.490001E+00 | loss scale: 8192.0 | grad norm: 61980.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22676 |
+
time (ms)
|
22677 |
+
iteration 854/ 159576 | consumed samples: 13664 | elapsed time per iteration (ms): 14372.8 | learning rate: 3.790E-06 | global batch size: 16 | lm loss: 7.555287E+00 | loss scale: 8192.0 | grad norm: 43650.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22678 |
+
time (ms)
|
22679 |
+
iteration 855/ 159576 | consumed samples: 13680 | elapsed time per iteration (ms): 13154.5 | learning rate: 3.794E-06 | global batch size: 16 | lm loss: 7.628311E+00 | loss scale: 8192.0 | grad norm: 32290.729 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22680 |
+
time (ms)
|
22681 |
+
iteration 856/ 159576 | consumed samples: 13696 | elapsed time per iteration (ms): 13509.6 | learning rate: 3.799E-06 | global batch size: 16 | lm loss: 7.757495E+00 | loss scale: 8192.0 | grad norm: 94063.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22682 |
+
time (ms)
|
22683 |
+
iteration 857/ 159576 | consumed samples: 13712 | elapsed time per iteration (ms): 14015.7 | learning rate: 3.803E-06 | global batch size: 16 | lm loss: 7.733263E+00 | loss scale: 8192.0 | grad norm: 53189.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22684 |
+
time (ms)
|
22685 |
+
iteration 858/ 159576 | consumed samples: 13728 | elapsed time per iteration (ms): 14357.8 | learning rate: 3.808E-06 | global batch size: 16 | lm loss: 7.570580E+00 | loss scale: 8192.0 | grad norm: 57239.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22686 |
+
time (ms)
|
22687 |
+
iteration 859/ 159576 | consumed samples: 13744 | elapsed time per iteration (ms): 13954.6 | learning rate: 3.812E-06 | global batch size: 16 | lm loss: 7.593122E+00 | loss scale: 8192.0 | grad norm: 45414.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22688 |
+
time (ms)
|
22689 |
+
iteration 860/ 159576 | consumed samples: 13760 | elapsed time per iteration (ms): 14212.3 | learning rate: 3.817E-06 | global batch size: 16 | lm loss: 7.571471E+00 | loss scale: 8192.0 | grad norm: 75659.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22690 |
+
time (ms)
|
22691 |
+
iteration 861/ 159576 | consumed samples: 13776 | elapsed time per iteration (ms): 14044.0 | learning rate: 3.821E-06 | global batch size: 16 | lm loss: 7.599829E+00 | loss scale: 8192.0 | grad norm: 47651.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22692 |
+
time (ms)
|
22693 |
+
iteration 862/ 159576 | consumed samples: 13792 | elapsed time per iteration (ms): 13529.5 | learning rate: 3.825E-06 | global batch size: 16 | lm loss: 7.427186E+00 | loss scale: 8192.0 | grad norm: 76377.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22694 |
+
time (ms)
|
22695 |
+
iteration 863/ 159576 | consumed samples: 13808 | elapsed time per iteration (ms): 14057.3 | learning rate: 3.830E-06 | global batch size: 16 | lm loss: 7.736305E+00 | loss scale: 8192.0 | grad norm: 76320.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22696 |
+
time (ms)
|
22697 |
+
iteration 864/ 159576 | consumed samples: 13824 | elapsed time per iteration (ms): 14064.2 | learning rate: 3.834E-06 | global batch size: 16 | lm loss: 7.637553E+00 | loss scale: 8192.0 | grad norm: 56695.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22698 |
+
time (ms)
|
22699 |
+
iteration 865/ 159576 | consumed samples: 13840 | elapsed time per iteration (ms): 14009.0 | learning rate: 3.839E-06 | global batch size: 16 | lm loss: 7.709378E+00 | loss scale: 8192.0 | grad norm: 77647.024 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22700 |
+
time (ms)
|
22701 |
+
iteration 866/ 159576 | consumed samples: 13856 | elapsed time per iteration (ms): 13951.3 | learning rate: 3.843E-06 | global batch size: 16 | lm loss: 7.856131E+00 | loss scale: 8192.0 | grad norm: 85925.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22702 |
+
time (ms)
|
22703 |
+
iteration 867/ 159576 | consumed samples: 13872 | elapsed time per iteration (ms): 14427.4 | learning rate: 3.848E-06 | global batch size: 16 | lm loss: 7.511599E+00 | loss scale: 8192.0 | grad norm: 50353.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22704 |
+
time (ms)
|
22705 |
+
iteration 868/ 159576 | consumed samples: 13888 | elapsed time per iteration (ms): 14117.9 | learning rate: 3.852E-06 | global batch size: 16 | lm loss: 7.803133E+00 | loss scale: 8192.0 | grad norm: 73334.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
22706 |
+
time (ms)
|