bigscience-bot
commited on
Commit
•
61ec961
1
Parent(s):
5e5cb49
new data
Browse files- logs/main_log.txt +55 -0
logs/main_log.txt
CHANGED
@@ -55327,3 +55327,58 @@ time (ms)
|
|
55327 |
time (ms)
|
55328 |
iteration 5930/ 159576 | consumed samples: 174768 | elapsed time per iteration (ms): 16796.3 | learning rate: 4.834E-05 | global batch size: 64 | lm loss: 6.373956E+00 | loss scale: 2048.0 | grad norm: 64436.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55329 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55327 |
time (ms)
|
55328 |
iteration 5930/ 159576 | consumed samples: 174768 | elapsed time per iteration (ms): 16796.3 | learning rate: 4.834E-05 | global batch size: 64 | lm loss: 6.373956E+00 | loss scale: 2048.0 | grad norm: 64436.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55329 |
time (ms)
|
55330 |
+
[2021-09-25 03:08:32] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1185639_[1-10%1] on 'gpu_p13' partition)
|
55331 |
+
[2021-09-25 03:08:32] PULSE: tr8-104B is running for 33:04 since 2021-09-25T02:35:28 (1185609 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n[0,7-8])
|
55332 |
+
iteration 5940/ 159576 | consumed samples: 175408 | elapsed time per iteration (ms): 16680.4 | learning rate: 4.851E-05 | global batch size: 64 | lm loss: 6.367229E+00 | loss scale: 2048.0 | grad norm: 61103.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55333 |
+
time (ms)
|
55334 |
+
iteration 5950/ 159576 | consumed samples: 176048 | elapsed time per iteration (ms): 16548.2 | learning rate: 4.869E-05 | global batch size: 64 | lm loss: 6.365273E+00 | loss scale: 2048.0 | grad norm: 74137.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55335 |
+
time (ms)
|
55336 |
+
iteration 5960/ 159576 | consumed samples: 176688 | elapsed time per iteration (ms): 16720.7 | learning rate: 4.887E-05 | global batch size: 64 | lm loss: 6.339179E+00 | loss scale: 2048.0 | grad norm: 117906.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55337 |
+
time (ms)
|
55338 |
+
iteration 5970/ 159576 | consumed samples: 177328 | elapsed time per iteration (ms): 16666.6 | learning rate: 4.905E-05 | global batch size: 64 | lm loss: 6.366007E+00 | loss scale: 2048.0 | grad norm: 135736.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55339 |
+
time (ms)
|
55340 |
+
iteration 5980/ 159576 | consumed samples: 177968 | elapsed time per iteration (ms): 16712.0 | learning rate: 4.922E-05 | global batch size: 64 | lm loss: 6.311417E+00 | loss scale: 2048.0 | grad norm: 59672.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55341 |
+
time (ms)
|
55342 |
+
iteration 5990/ 159576 | consumed samples: 178608 | elapsed time per iteration (ms): 16795.9 | learning rate: 4.940E-05 | global batch size: 64 | lm loss: 6.346366E+00 | loss scale: 2048.0 | grad norm: 70394.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55343 |
+
time (ms)
|
55344 |
+
[2021-09-25 03:26:24,359] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=13, lr=[4.9579881656804734e-05, 4.9579881656804734e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
|
55345 |
+
steps: 6000 loss: 6.4051 iter time (s): 0.008 samples/sec: 7888.018
|
55346 |
+
iteration 6000/ 159576 | consumed samples: 179248 | elapsed time per iteration (ms): 16825.1 | learning rate: 4.958E-05 | global batch size: 64 | lm loss: 6.338142E+00 | loss scale: 2048.0 | grad norm: 51469.855 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55347 |
+
time (ms)
|
55348 |
+
------------------------------------------------------------------------------------------------
|
55349 |
+
validation loss at iteration 6000 | lm loss value: 6.305492E+00 | lm loss PPL: 5.475711E+02 |
|
55350 |
+
------------------------------------------------------------------------------------------------
|
55351 |
+
saving checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
|
55352 |
+
[2021-09-25 03:26:46,630] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints/global_step6000/mp_rank_00_model_states.pt
|
55353 |
+
successfully saved checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr8-104B/checkpoints
|
55354 |
+
time (ms) | save-checkpoint: 18535.85
|
55355 |
+
iteration 6010/ 159576 | consumed samples: 179888 | elapsed time per iteration (ms): 19605.0 | learning rate: 4.976E-05 | global batch size: 64 | lm loss: 6.332598E+00 | loss scale: 2048.0 | grad norm: 64216.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55356 |
+
time (ms)
|
55357 |
+
iteration 6020/ 159576 | consumed samples: 180528 | elapsed time per iteration (ms): 16682.2 | learning rate: 4.993E-05 | global batch size: 64 | lm loss: 6.346989E+00 | loss scale: 2048.0 | grad norm: 65052.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55358 |
+
time (ms)
|
55359 |
+
iteration 6030/ 159576 | consumed samples: 181168 | elapsed time per iteration (ms): 16536.1 | learning rate: 5.011E-05 | global batch size: 64 | lm loss: 6.314711E+00 | loss scale: 2048.0 | grad norm: 61186.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55360 |
+
time (ms)
|
55361 |
+
iteration 6040/ 159576 | consumed samples: 181808 | elapsed time per iteration (ms): 16509.4 | learning rate: 5.029E-05 | global batch size: 64 | lm loss: 6.347876E+00 | loss scale: 2048.0 | grad norm: 80684.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55362 |
+
time (ms)
|
55363 |
+
iteration 6050/ 159576 | consumed samples: 182448 | elapsed time per iteration (ms): 16821.6 | learning rate: 5.047E-05 | global batch size: 64 | lm loss: 6.345741E+00 | loss scale: 2048.0 | grad norm: 207970.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55364 |
+
time (ms)
|
55365 |
+
iteration 6060/ 159576 | consumed samples: 183088 | elapsed time per iteration (ms): 16815.3 | learning rate: 5.064E-05 | global batch size: 64 | lm loss: 6.341463E+00 | loss scale: 2048.0 | grad norm: 57913.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55366 |
+
time (ms)
|
55367 |
+
iteration 6070/ 159576 | consumed samples: 183728 | elapsed time per iteration (ms): 16825.8 | learning rate: 5.082E-05 | global batch size: 64 | lm loss: 6.336625E+00 | loss scale: 2048.0 | grad norm: 62496.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55368 |
+
time (ms)
|
55369 |
+
iteration 6080/ 159576 | consumed samples: 184368 | elapsed time per iteration (ms): 16749.3 | learning rate: 5.100E-05 | global batch size: 64 | lm loss: 6.378619E+00 | loss scale: 2048.0 | grad norm: 53421.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55370 |
+
time (ms)
|
55371 |
+
iteration 6090/ 159576 | consumed samples: 185008 | elapsed time per iteration (ms): 16844.2 | learning rate: 5.118E-05 | global batch size: 64 | lm loss: 6.363810E+00 | loss scale: 2048.0 | grad norm: 53621.070 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55372 |
+
time (ms)
|
55373 |
+
iteration 6100/ 159576 | consumed samples: 185648 | elapsed time per iteration (ms): 16803.1 | learning rate: 5.136E-05 | global batch size: 64 | lm loss: 6.397610E+00 | loss scale: 2048.0 | grad norm: 63234.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55374 |
+
time (ms)
|
55375 |
+
iteration 6110/ 159576 | consumed samples: 186288 | elapsed time per iteration (ms): 16808.5 | learning rate: 5.153E-05 | global batch size: 64 | lm loss: 6.359557E+00 | loss scale: 2048.0 | grad norm: 52582.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55376 |
+
time (ms)
|
55377 |
+
iteration 6120/ 159576 | consumed samples: 186928 | elapsed time per iteration (ms): 16792.9 | learning rate: 5.171E-05 | global batch size: 64 | lm loss: 6.347573E+00 | loss scale: 2048.0 | grad norm: 50959.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55378 |
+
time (ms)
|
55379 |
+
iteration 6130/ 159576 | consumed samples: 187568 | elapsed time per iteration (ms): 16806.7 | learning rate: 5.189E-05 | global batch size: 64 | lm loss: 6.351057E+00 | loss scale: 2048.0 | grad norm: 152670.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55380 |
+
time (ms)
|
55381 |
+
iteration 6140/ 159576 | consumed samples: 188208 | elapsed time per iteration (ms): 16808.0 | learning rate: 5.207E-05 | global batch size: 64 | lm loss: 6.374673E+00 | loss scale: 2048.0 | grad norm: 50742.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55382 |
+
time (ms)
|
55383 |
+
[2021-09-25 04:08:28] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1185639_[1-10%1] on 'gpu_p13' partition)
|
55384 |
+
[2021-09-25 04:08:28] PULSE: tr8-104B is running for 1:33:00 since 2021-09-25T02:35:28 (1185609 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n[0,7-8])
|