bigscience-bot commited on
Commit
939caa5
1 Parent(s): ab275a7
Files changed (1) hide show
  1. logs/main_log.txt +29 -0
logs/main_log.txt CHANGED
@@ -99875,3 +99875,32 @@ time (ms)
99875
  time (ms)
99876
  iteration 8420/ 159576 | consumed samples: 435760 | elapsed time per iteration (ms): 22797.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.923830E+00 | loss scale: 1024.0 | grad norm: 16621.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99877
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99875
  time (ms)
99876
  iteration 8420/ 159576 | consumed samples: 435760 | elapsed time per iteration (ms): 22797.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.923830E+00 | loss scale: 1024.0 | grad norm: 16621.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99877
  time (ms)
99878
+ iteration 8430/ 159576 | consumed samples: 437360 | elapsed time per iteration (ms): 23019.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.940806E+00 | loss scale: 1024.0 | grad norm: 15050.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99879
+ time (ms)
99880
+ iteration 8440/ 159576 | consumed samples: 438960 | elapsed time per iteration (ms): 23026.2 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.984757E+00 | loss scale: 1024.0 | grad norm: 22968.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99881
+ time (ms)
99882
+ iteration 8450/ 159576 | consumed samples: 440560 | elapsed time per iteration (ms): 22903.0 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.970832E+00 | loss scale: 1024.0 | grad norm: 25206.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99883
+ time (ms)
99884
+ iteration 8460/ 159576 | consumed samples: 442160 | elapsed time per iteration (ms): 22992.7 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 6.992513E+00 | loss scale: 1024.0 | grad norm: 9219.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99885
+ time (ms)
99886
+ iteration 8470/ 159576 | consumed samples: 443760 | elapsed time per iteration (ms): 23036.6 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.053975E+00 | loss scale: 1024.0 | grad norm: 9743.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99887
+ time (ms)
99888
+ iteration 8480/ 159576 | consumed samples: 445360 | elapsed time per iteration (ms): 22710.5 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.087634E+00 | loss scale: 1024.0 | grad norm: 36403.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99889
+ time (ms)
99890
+ iteration 8490/ 159576 | consumed samples: 446960 | elapsed time per iteration (ms): 22994.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.142048E+00 | loss scale: 1024.0 | grad norm: 8807.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99891
+ time (ms)
99892
+ iteration 8500/ 159576 | consumed samples: 448560 | elapsed time per iteration (ms): 22707.3 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.160313E+00 | loss scale: 1024.0 | grad norm: 9148.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99893
+ time (ms)
99894
+ iteration 8510/ 159576 | consumed samples: 450160 | elapsed time per iteration (ms): 22963.9 | learning rate: 6.000E-05 | global batch size: 160 | lm loss: 7.277474E+00 | loss scale: 1024.0 | grad norm: 43448.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99895
+ time (ms)
99896
+ iteration 8520/ 159576 | consumed samples: 451760 | elapsed time per iteration (ms): 19193.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 64.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99897
+ time (ms)
99898
+ iteration 8530/ 159576 | consumed samples: 453360 | elapsed time per iteration (ms): 15554.5 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99899
+ time (ms)
99900
+ iteration 8540/ 159576 | consumed samples: 454960 | elapsed time per iteration (ms): 15434.8 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99901
+ time (ms)
99902
+ iteration 8550/ 159576 | consumed samples: 456560 | elapsed time per iteration (ms): 15729.0 | learning rate: 6.000E-05 | global batch size: 160 | loss scale: 1.0 | grad norm: 5533.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99903
+ time (ms)
99904
+ [2021-09-28 06:32:50] PULSE: tr8-104B is scheduled to start in 17:29:26 (at 2021-09-29T00:02:17) (1277218 on 'gpu_p13' partition)
99905
+ [2021-09-28 06:32:50] PULSE: tr8-104B is waiting for the previous job to finish before scheduling a new one using the dependency mechanism (1277295_[1-10%1] on 'gpu_p13' partition)
99906
+ [2021-09-28 06:32:50] PULSE: tr8-104B is running for 12:49:24 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])