bigscience-bot commited on
Commit
b86e38f
1 Parent(s): e128def
Files changed (1) hide show
  1. logs/main_log.txt +19 -0
logs/main_log.txt CHANGED
@@ -99617,3 +99617,22 @@ time (ms)
99617
  time (ms)
99618
  iteration 7250/ 159576 | consumed samples: 285152 | elapsed time per iteration (ms): 19421.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.554094E+00 | loss scale: 2048.0 | grad norm: 79780.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99619
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99617
  time (ms)
99618
  iteration 7250/ 159576 | consumed samples: 285152 | elapsed time per iteration (ms): 19421.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.554094E+00 | loss scale: 2048.0 | grad norm: 79780.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99619
  time (ms)
99620
+ iteration 7260/ 159576 | consumed samples: 286272 | elapsed time per iteration (ms): 19643.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.545351E+00 | loss scale: 2048.0 | grad norm: 153165.121 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99621
+ time (ms)
99622
+ iteration 7270/ 159576 | consumed samples: 287392 | elapsed time per iteration (ms): 19873.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.548807E+00 | loss scale: 2048.0 | grad norm: 96725.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99623
+ time (ms)
99624
+ iteration 7280/ 159576 | consumed samples: 288512 | elapsed time per iteration (ms): 19830.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.532312E+00 | loss scale: 2048.0 | grad norm: 85054.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99625
+ time (ms)
99626
+ iteration 7290/ 159576 | consumed samples: 289632 | elapsed time per iteration (ms): 19469.1 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.535855E+00 | loss scale: 2048.0 | grad norm: 66255.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99627
+ time (ms)
99628
+ iteration 7300/ 159576 | consumed samples: 290752 | elapsed time per iteration (ms): 19578.9 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.583752E+00 | loss scale: 2048.0 | grad norm: 61901.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99629
+ time (ms)
99630
+ iteration 7310/ 159576 | consumed samples: 291872 | elapsed time per iteration (ms): 19646.2 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.539584E+00 | loss scale: 2048.0 | grad norm: 68238.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99631
+ time (ms)
99632
+ iteration 7320/ 159576 | consumed samples: 292992 | elapsed time per iteration (ms): 19642.5 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.526649E+00 | loss scale: 2048.0 | grad norm: 69527.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99633
+ time (ms)
99634
+ iteration 7330/ 159576 | consumed samples: 294112 | elapsed time per iteration (ms): 19508.3 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.514026E+00 | loss scale: 2048.0 | grad norm: 63745.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99635
+ time (ms)
99636
+ iteration 7340/ 159576 | consumed samples: 295232 | elapsed time per iteration (ms): 19676.4 | learning rate: 6.000E-05 | global batch size: 112 | lm loss: 6.519949E+00 | loss scale: 2048.0 | grad norm: 96730.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
99637
+ time (ms)
99638
+ [2021-09-27 23:32:04] PULSE: tr8-104B is running for 5:48:38 since 2021-09-27T17:43:26 (1271196 on 'gpu_p13' partition (r7i7n[6-8],r8i0n[0-8],r8i1n[0-4],r8i7n[3-8],r9i0n[0-6,8],r9i1n[0-8],r9i2n0,r9i4n8,r9i5n[0-8],r9i6n[0-8],r9i7n[3-6])