bigscience-bot commited on
Commit
1338a52
·
1 Parent(s): 0652076
Files changed (1) hide show
  1. logs/main_log.txt +12 -0
logs/main_log.txt CHANGED
@@ -55315,3 +55315,15 @@ time (ms)
55315
  time (ms)
55316
  iteration 5870/ 159576 | consumed samples: 170928 | elapsed time per iteration (ms): 16844.9 | learning rate: 4.727E-05 | global batch size: 64 | lm loss: 6.372821E+00 | loss scale: 2048.0 | grad norm: 49107.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
55317
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
55315
  time (ms)
55316
  iteration 5870/ 159576 | consumed samples: 170928 | elapsed time per iteration (ms): 16844.9 | learning rate: 4.727E-05 | global batch size: 64 | lm loss: 6.372821E+00 | loss scale: 2048.0 | grad norm: 49107.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
55317
  time (ms)
55318
+ iteration 5880/ 159576 | consumed samples: 171568 | elapsed time per iteration (ms): 16812.2 | learning rate: 4.745E-05 | global batch size: 64 | lm loss: 6.379050E+00 | loss scale: 2048.0 | grad norm: 76898.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
55319
+ time (ms)
55320
+ iteration 5890/ 159576 | consumed samples: 172208 | elapsed time per iteration (ms): 16819.7 | learning rate: 4.763E-05 | global batch size: 64 | lm loss: 6.333071E+00 | loss scale: 2048.0 | grad norm: 69874.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
55321
+ time (ms)
55322
+ iteration 5900/ 159576 | consumed samples: 172848 | elapsed time per iteration (ms): 16821.3 | learning rate: 4.780E-05 | global batch size: 64 | lm loss: 6.354385E+00 | loss scale: 2048.0 | grad norm: 57915.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
55323
+ time (ms)
55324
+ iteration 5910/ 159576 | consumed samples: 173488 | elapsed time per iteration (ms): 16679.9 | learning rate: 4.798E-05 | global batch size: 64 | lm loss: 6.361916E+00 | loss scale: 2048.0 | grad norm: 56535.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
55325
+ time (ms)
55326
+ iteration 5920/ 159576 | consumed samples: 174128 | elapsed time per iteration (ms): 16731.8 | learning rate: 4.816E-05 | global batch size: 64 | lm loss: 6.371978E+00 | loss scale: 2048.0 | grad norm: 75613.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
55327
+ time (ms)
55328
+ iteration 5930/ 159576 | consumed samples: 174768 | elapsed time per iteration (ms): 16796.3 | learning rate: 4.834E-05 | global batch size: 64 | lm loss: 6.373956E+00 | loss scale: 2048.0 | grad norm: 64436.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
55329
+ time (ms)