bigscience-bot
commited on
Commit
•
939caa5
1
Parent(s):
ab275a7
new data
Browse files- logs/main_log.txt +29 -0
logs/main_log.txt
CHANGED
@@ -99875,3 +99875,32 @@ time (ms)
|
|
99875 |
time (ms)
|
99876 |
iteration 8420/ 159576 | consumed samples: 435760 | elapsed time per iteration (ms): 22797.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.923830E+00 | loss scale: 1024.0 | grad norm: 16621.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99877 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99875 |
time (ms)
|
99876 |
iteration 8420/ 159576 | consumed samples: 435760 | elapsed time per iteration (ms): 22797.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.923830E+00 | loss scale: 1024.0 | grad norm: 16621.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99877 |
time (ms)
|
99878 |
+
iteration 8430/ 159576 | consumed samples: 437360 | elapsed time per iteration (ms): 23019.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.940806E+00 | loss scale: 1024.0 | grad norm: 15050.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99879 |
+
time (ms)
|
99880 |
+
iteration 8440/ 159576 | consumed samples: 438960 | elapsed time per iteration (ms): 23026.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.984757E+00 | loss scale: 1024.0 | grad norm: 22968.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99881 |
+
time (ms)
|
99882 |
+
iteration 8450/ 159576 | consumed samples: 440560 | elapsed time per iteration (ms): 22903.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.970832E+00 | loss scale: 1024.0 | grad norm: 25206.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99883 |
+
time (ms)
|
99884 |
+
iteration 8460/ 159576 | consumed samples: 442160 | elapsed time per iteration (ms): 22992.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.992513E+00 | loss scale: 1024.0 | grad norm: 9219.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99885 |
+
time (ms)
|
99886 |
+
iteration 8470/ 159576 | consumed samples: 443760 | elapsed time per iteration (ms): 23036.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.053975E+00 | loss scale: 1024.0 | grad norm: 9743.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99887 |
+
time (ms)
|
99888 |
+
iteration 8480/ 159576 | consumed samples: 445360 | elapsed time per iteration (ms): 22710.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.087634E+00 | loss scale: 1024.0 | grad norm: 36403.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99889 |
+
time (ms)
|
99890 |
+
iteration 8490/ 159576 | consumed samples: 446960 | elapsed time per iteration (ms): 22994.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.142048E+00 | loss scale: 1024.0 | grad norm: 8807.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99891 |
+
time (ms)
|
99892 |
+
iteration 8500/ 159576 | consumed samples: 448560 | elapsed time per iteration (ms): 22707.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.160313E+00 | loss scale: 1024.0 | grad norm: 9148.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99893 |
+
time (ms)
|
99894 |
+
iteration 8510/ 159576 | consumed samples: 450160 | elapsed time per iteration (ms): 22963.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.277474E+00 | loss scale: 1024.0 | grad norm: 43448.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99895 |
+
time (ms)
|
99896 |
+
iteration 8520/ 159576 | consumed samples: 451760 | elapsed time per iteration (ms): 19193.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 64.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99897 |
+
time (ms)
|
99898 |
+
iteration 8530/ 159576 | consumed samples: 453360 | elapsed time per iteration (ms): 15554.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99899 |
+
time (ms)
|
99900 |
+
iteration 8540/ 159576 | consumed samples: 454960 | elapsed time per iteration (ms): 15434.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99901 |
+
time (ms)
|
99902 |
+
iteration 8550/ 159576 | consumed samples: 456560 | elapsed time per iteration (ms): 15729.0 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
99903 |
+
time (ms)
|
99904 |
+
[2021-09-28 06:32:50] PULSE: tr8-104B is scheduled to start in 17:29:26 (at 2021-09-29T00:02:17) (1277218 on 'gpu_p13' partition)
|
99905 |
+
[2021-09-28 06:32:50] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition)
|
99906 |
+
[2021-09-28 06:32:50] PULSE: tr8-104B is running for 12:49:24 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])
|