bigscience-bot
commited on
Commit
·
1338a52
1
Parent(s):
0652076
new data
Browse files- logs/main_log.txt +12 -0
logs/main_log.txt
CHANGED
@@ -55315,3 +55315,15 @@ time (ms)
|
|
55315 |
time (ms)
|
55316 |
iteration 5870/ 159576 | consumed samples: 170928 | elapsed time per iteration (ms): 16844.9 | learning rate: 4.727E-05 | global batch size: 64 | lm loss: 6.372821E+00 | loss scale: 2048.0 | grad norm: 49107.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55317 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55315 |
time (ms)
|
55316 |
iteration 5870/ 159576 | consumed samples: 170928 | elapsed time per iteration (ms): 16844.9 | learning rate: 4.727E-05 | global batch size: 64 | lm loss: 6.372821E+00 | loss scale: 2048.0 | grad norm: 49107.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55317 |
time (ms)
|
55318 |
+
iteration 5880/ 159576 | consumed samples: 171568 | elapsed time per iteration (ms): 16812.2 | learning rate: 4.745E-05 | global batch size: 64 | lm loss: 6.379050E+00 | loss scale: 2048.0 | grad norm: 76898.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55319 |
+
time (ms)
|
55320 |
+
iteration 5890/ 159576 | consumed samples: 172208 | elapsed time per iteration (ms): 16819.7 | learning rate: 4.763E-05 | global batch size: 64 | lm loss: 6.333071E+00 | loss scale: 2048.0 | grad norm: 69874.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55321 |
+
time (ms)
|
55322 |
+
iteration 5900/ 159576 | consumed samples: 172848 | elapsed time per iteration (ms): 16821.3 | learning rate: 4.780E-05 | global batch size: 64 | lm loss: 6.354385E+00 | loss scale: 2048.0 | grad norm: 57915.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55323 |
+
time (ms)
|
55324 |
+
iteration 5910/ 159576 | consumed samples: 173488 | elapsed time per iteration (ms): 16679.9 | learning rate: 4.798E-05 | global batch size: 64 | lm loss: 6.361916E+00 | loss scale: 2048.0 | grad norm: 56535.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55325 |
+
time (ms)
|
55326 |
+
iteration 5920/ 159576 | consumed samples: 174128 | elapsed time per iteration (ms): 16731.8 | learning rate: 4.816E-05 | global batch size: 64 | lm loss: 6.371978E+00 | loss scale: 2048.0 | grad norm: 75613.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55327 |
+
time (ms)
|
55328 |
+
iteration 5930/ 159576 | consumed samples: 174768 | elapsed time per iteration (ms): 16796.3 | learning rate: 4.834E-05 | global batch size: 64 | lm loss: 6.373956E+00 | loss scale: 2048.0 | grad norm: 64436.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
55329 |
+
time (ms)
|