bigscience-bot commited on
Commit
1271e23
1 Parent(s): 53fba9f
Files changed (1) hide show
  1. logs/main_log.txt +106 -0
logs/main_log.txt CHANGED
@@ -33998,3 +33998,109 @@ time (ms)
33998
  time (ms)
33999
  iteration 1192/ 159576 | consumed samples: 19072 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.290E-06 | global batch size: 16 | lm loss: 7.350411E+00 | loss scale: 16384.0 | grad norm: 81228.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34000
  time (ms)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33998
  time (ms)
33999
  iteration 1192/ 159576 | consumed samples: 19072 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.290E-06 | global batch size: 16 | lm loss: 7.350411E+00 | loss scale: 16384.0 | grad norm: 81228.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34000
  time (ms)
34001
+ iteration 1193/ 159576 | consumed samples: 19088 | elapsed time per iteration (ms): 13585.9 | learning rate: 5.294E-06 | global batch size: 16 | lm loss: 7.583058E+00 | loss scale: 16384.0 | grad norm: 291080.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34002
+ time (ms)
34003
+ iteration 1194/ 159576 | consumed samples: 19104 | elapsed time per iteration (ms): 13658.0 | learning rate: 5.299E-06 | global batch size: 16 | lm loss: 7.808938E+00 | loss scale: 16384.0 | grad norm: 193632.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34004
+ time (ms)
34005
+ iteration 1195/ 159576 | consumed samples: 19120 | elapsed time per iteration (ms): 13777.0 | learning rate: 5.303E-06 | global batch size: 16 | lm loss: 7.459247E+00 | loss scale: 16384.0 | grad norm: 100738.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34006
+ time (ms)
34007
+ iteration 1196/ 159576 | consumed samples: 19136 | elapsed time per iteration (ms): 13624.3 | learning rate: 5.308E-06 | global batch size: 16 | lm loss: 7.240894E+00 | loss scale: 16384.0 | grad norm: 102223.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34008
+ time (ms)
34009
+ iteration 1197/ 159576 | consumed samples: 19152 | elapsed time per iteration (ms): 13630.2 | learning rate: 5.312E-06 | global batch size: 16 | lm loss: 7.469604E+00 | loss scale: 16384.0 | grad norm: 91547.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34010
+ time (ms)
34011
+ iteration 1198/ 159576 | consumed samples: 19168 | elapsed time per iteration (ms): 13603.4 | learning rate: 5.317E-06 | global batch size: 16 | lm loss: 7.399169E+00 | loss scale: 16384.0 | grad norm: 246196.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34012
+ time (ms)
34013
+ iteration 1199/ 159576 | consumed samples: 19184 | elapsed time per iteration (ms): 14028.5 | learning rate: 5.321E-06 | global batch size: 16 | lm loss: 7.465099E+00 | loss scale: 16384.0 | grad norm: 185665.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34014
+ time (ms)
34015
+ iteration 1200/ 159576 | consumed samples: 19200 | elapsed time per iteration (ms): 13601.1 | learning rate: 5.325E-06 | global batch size: 16 | lm loss: 7.383169E+00 | loss scale: 16384.0 | grad norm: 115872.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34016
+ time (ms)
34017
+ iteration 1201/ 159576 | consumed samples: 19216 | elapsed time per iteration (ms): 13566.6 | learning rate: 5.330E-06 | global batch size: 16 | lm loss: 7.352910E+00 | loss scale: 16384.0 | grad norm: 114834.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34018
+ time (ms)
34019
+ iteration 1202/ 159576 | consumed samples: 19232 | elapsed time per iteration (ms): 13557.4 | learning rate: 5.334E-06 | global batch size: 16 | lm loss: 7.521720E+00 | loss scale: 16384.0 | grad norm: 101976.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34020
+ time (ms)
34021
+ iteration 1203/ 159576 | consumed samples: 19248 | elapsed time per iteration (ms): 13525.0 | learning rate: 5.339E-06 | global batch size: 16 | lm loss: 7.225696E+00 | loss scale: 16384.0 | grad norm: 178745.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34022
+ time (ms)
34023
+ iteration 1204/ 159576 | consumed samples: 19264 | elapsed time per iteration (ms): 13539.3 | learning rate: 5.343E-06 | global batch size: 16 | lm loss: 7.375963E+00 | loss scale: 16384.0 | grad norm: 175723.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34024
+ time (ms)
34025
+ iteration 1205/ 159576 | consumed samples: 19280 | elapsed time per iteration (ms): 13532.3 | learning rate: 5.348E-06 | global batch size: 16 | lm loss: 7.402988E+00 | loss scale: 16384.0 | grad norm: 104645.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34026
+ time (ms)
34027
+ iteration 1206/ 159576 | consumed samples: 19296 | elapsed time per iteration (ms): 13502.9 | learning rate: 5.352E-06 | global batch size: 16 | lm loss: 7.302839E+00 | loss scale: 16384.0 | grad norm: 99328.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34028
+ time (ms)
34029
+ iteration 1207/ 159576 | consumed samples: 19312 | elapsed time per iteration (ms): 13540.4 | learning rate: 5.357E-06 | global batch size: 16 | lm loss: 7.555269E+00 | loss scale: 16384.0 | grad norm: 89166.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34030
+ time (ms)
34031
+ iteration 1208/ 159576 | consumed samples: 19328 | elapsed time per iteration (ms): 13900.0 | learning rate: 5.361E-06 | global batch size: 16 | lm loss: 7.459805E+00 | loss scale: 16384.0 | grad norm: 135152.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34032
+ time (ms)
34033
+ iteration 1209/ 159576 | consumed samples: 19344 | elapsed time per iteration (ms): 13560.6 | learning rate: 5.365E-06 | global batch size: 16 | lm loss: 7.419579E+00 | loss scale: 16384.0 | grad norm: 101249.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34034
+ time (ms)
34035
+ iteration 1210/ 159576 | consumed samples: 19360 | elapsed time per iteration (ms): 13658.8 | learning rate: 5.370E-06 | global batch size: 16 | lm loss: 7.348646E+00 | loss scale: 16384.0 | grad norm: 104483.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34036
+ time (ms)
34037
+ iteration 1211/ 159576 | consumed samples: 19376 | elapsed time per iteration (ms): 13533.6 | learning rate: 5.374E-06 | global batch size: 16 | lm loss: 7.494230E+00 | loss scale: 16384.0 | grad norm: 110210.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34038
+ time (ms)
34039
+ iteration 1212/ 159576 | consumed samples: 19392 | elapsed time per iteration (ms): 13905.0 | learning rate: 5.379E-06 | global batch size: 16 | lm loss: 7.390188E+00 | loss scale: 16384.0 | grad norm: 96645.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34040
+ time (ms)
34041
+ iteration 1213/ 159576 | consumed samples: 19408 | elapsed time per iteration (ms): 13673.2 | learning rate: 5.383E-06 | global batch size: 16 | lm loss: 7.318599E+00 | loss scale: 16384.0 | grad norm: 166216.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34042
+ time (ms)
34043
+ iteration 1214/ 159576 | consumed samples: 19424 | elapsed time per iteration (ms): 13582.9 | learning rate: 5.388E-06 | global batch size: 16 | lm loss: 7.262068E+00 | loss scale: 16384.0 | grad norm: 75724.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34044
+ time (ms)
34045
+ iteration 1215/ 159576 | consumed samples: 19440 | elapsed time per iteration (ms): 13570.1 | learning rate: 5.392E-06 | global batch size: 16 | lm loss: 7.594563E+00 | loss scale: 16384.0 | grad norm: 95306.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34046
+ time (ms)
34047
+ iteration 1216/ 159576 | consumed samples: 19456 | elapsed time per iteration (ms): 13639.7 | learning rate: 5.396E-06 | global batch size: 16 | lm loss: 7.375734E+00 | loss scale: 16384.0 | grad norm: 86152.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34048
+ time (ms)
34049
+ iteration 1217/ 159576 | consumed samples: 19472 | elapsed time per iteration (ms): 14091.6 | learning rate: 5.401E-06 | global batch size: 16 | lm loss: 7.213047E+00 | loss scale: 16384.0 | grad norm: 95583.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34050
+ time (ms)
34051
+ iteration 1218/ 159576 | consumed samples: 19488 | elapsed time per iteration (ms): 13516.3 | learning rate: 5.405E-06 | global batch size: 16 | lm loss: 7.437682E+00 | loss scale: 16384.0 | grad norm: 221549.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34052
+ time (ms)
34053
+ iteration 1219/ 159576 | consumed samples: 19504 | elapsed time per iteration (ms): 13610.0 | learning rate: 5.410E-06 | global batch size: 16 | lm loss: 7.254605E+00 | loss scale: 16384.0 | grad norm: 97554.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34054
+ time (ms)
34055
+ iteration 1220/ 159576 | consumed samples: 19520 | elapsed time per iteration (ms): 13565.5 | learning rate: 5.414E-06 | global batch size: 16 | lm loss: 7.248229E+00 | loss scale: 16384.0 | grad norm: 89138.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34056
+ time (ms)
34057
+ iteration 1221/ 159576 | consumed samples: 19536 | elapsed time per iteration (ms): 13989.3 | learning rate: 5.419E-06 | global batch size: 16 | lm loss: 7.313151E+00 | loss scale: 16384.0 | grad norm: 172651.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34058
+ time (ms)
34059
+ iteration 1222/ 159576 | consumed samples: 19552 | elapsed time per iteration (ms): 13602.4 | learning rate: 5.423E-06 | global batch size: 16 | lm loss: 7.476789E+00 | loss scale: 16384.0 | grad norm: 67387.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34060
+ time (ms)
34061
+ iteration 1223/ 159576 | consumed samples: 19568 | elapsed time per iteration (ms): 13656.0 | learning rate: 5.428E-06 | global batch size: 16 | lm loss: 7.289939E+00 | loss scale: 16384.0 | grad norm: 207125.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34062
+ time (ms)
34063
+ iteration 1224/ 159576 | consumed samples: 19584 | elapsed time per iteration (ms): 13537.8 | learning rate: 5.432E-06 | global batch size: 16 | lm loss: 7.409894E+00 | loss scale: 16384.0 | grad norm: 156218.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34064
+ time (ms)
34065
+ iteration 1225/ 159576 | consumed samples: 19600 | elapsed time per iteration (ms): 13600.0 | learning rate: 5.436E-06 | global batch size: 16 | lm loss: 7.226832E+00 | loss scale: 16384.0 | grad norm: 93258.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34066
+ time (ms)
34067
+ iteration 1226/ 159576 | consumed samples: 19616 | elapsed time per iteration (ms): 13778.7 | learning rate: 5.441E-06 | global batch size: 16 | lm loss: 7.406470E+00 | loss scale: 16384.0 | grad norm: 95037.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34068
+ time (ms)
34069
+ iteration 1227/ 159576 | consumed samples: 19632 | elapsed time per iteration (ms): 13609.5 | learning rate: 5.445E-06 | global batch size: 16 | lm loss: 7.385060E+00 | loss scale: 16384.0 | grad norm: 77831.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34070
+ time (ms)
34071
+ iteration 1228/ 159576 | consumed samples: 19648 | elapsed time per iteration (ms): 13561.8 | learning rate: 5.450E-06 | global batch size: 16 | lm loss: 7.283795E+00 | loss scale: 16384.0 | grad norm: 219813.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34072
+ time (ms)
34073
+ iteration 1229/ 159576 | consumed samples: 19664 | elapsed time per iteration (ms): 13619.4 | learning rate: 5.454E-06 | global batch size: 16 | lm loss: 7.344219E+00 | loss scale: 16384.0 | grad norm: 122192.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34074
+ time (ms)
34075
+ iteration 1230/ 159576 | consumed samples: 19680 | elapsed time per iteration (ms): 14054.6 | learning rate: 5.459E-06 | global batch size: 16 | lm loss: 7.364305E+00 | loss scale: 16384.0 | grad norm: 90944.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34076
+ time (ms)
34077
+ iteration 1231/ 159576 | consumed samples: 19696 | elapsed time per iteration (ms): 13589.9 | learning rate: 5.463E-06 | global batch size: 16 | lm loss: 7.421730E+00 | loss scale: 16384.0 | grad norm: 178816.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34078
+ time (ms)
34079
+ iteration 1232/ 159576 | consumed samples: 19712 | elapsed time per iteration (ms): 13624.6 | learning rate: 5.467E-06 | global batch size: 16 | lm loss: 7.278720E+00 | loss scale: 16384.0 | grad norm: 101190.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34080
+ time (ms)
34081
+ iteration 1233/ 159576 | consumed samples: 19728 | elapsed time per iteration (ms): 13574.7 | learning rate: 5.472E-06 | global batch size: 16 | lm loss: 7.525582E+00 | loss scale: 16384.0 | grad norm: 95476.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34082
+ time (ms)
34083
+ iteration 1234/ 159576 | consumed samples: 19744 | elapsed time per iteration (ms): 13981.0 | learning rate: 5.476E-06 | global batch size: 16 | lm loss: 7.294508E+00 | loss scale: 16384.0 | grad norm: 110379.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34084
+ time (ms)
34085
+ iteration 1235/ 159576 | consumed samples: 19760 | elapsed time per iteration (ms): 13641.1 | learning rate: 5.481E-06 | global batch size: 16 | lm loss: 7.431972E+00 | loss scale: 16384.0 | grad norm: 103188.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34086
+ time (ms)
34087
+ iteration 1236/ 159576 | consumed samples: 19776 | elapsed time per iteration (ms): 13575.4 | learning rate: 5.485E-06 | global batch size: 16 | lm loss: 7.397687E+00 | loss scale: 16384.0 | grad norm: 92125.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34088
+ time (ms)
34089
+ iteration 1237/ 159576 | consumed samples: 19792 | elapsed time per iteration (ms): 13672.0 | learning rate: 5.490E-06 | global batch size: 16 | lm loss: 7.314774E+00 | loss scale: 16384.0 | grad norm: 75870.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34090
+ time (ms)
34091
+ iteration 1238/ 159576 | consumed samples: 19808 | elapsed time per iteration (ms): 13509.4 | learning rate: 5.494E-06 | global batch size: 16 | lm loss: 7.187806E+00 | loss scale: 16384.0 | grad norm: 173296.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34092
+ time (ms)
34093
+ iteration 1239/ 159576 | consumed samples: 19824 | elapsed time per iteration (ms): 13875.3 | learning rate: 5.499E-06 | global batch size: 16 | lm loss: 7.376097E+00 | loss scale: 16384.0 | grad norm: 133632.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34094
+ time (ms)
34095
+ iteration 1240/ 159576 | consumed samples: 19840 | elapsed time per iteration (ms): 13610.1 | learning rate: 5.503E-06 | global batch size: 16 | lm loss: 7.267582E+00 | loss scale: 16384.0 | grad norm: 85104.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34096
+ time (ms)
34097
+ iteration 1241/ 159576 | consumed samples: 19856 | elapsed time per iteration (ms): 13551.5 | learning rate: 5.507E-06 | global batch size: 16 | lm loss: 7.352735E+00 | loss scale: 16384.0 | grad norm: 90699.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34098
+ time (ms)
34099
+ iteration 1242/ 159576 | consumed samples: 19872 | elapsed time per iteration (ms): 13593.9 | learning rate: 5.512E-06 | global batch size: 16 | lm loss: 7.468503E+00 | loss scale: 16384.0 | grad norm: 83188.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34100
+ time (ms)
34101
+ iteration 1243/ 159576 | consumed samples: 19888 | elapsed time per iteration (ms): 13930.9 | learning rate: 5.516E-06 | global batch size: 16 | lm loss: 7.214951E+00 | loss scale: 16384.0 | grad norm: 78366.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34102
+ time (ms)
34103
+ iteration 1244/ 159576 | consumed samples: 19904 | elapsed time per iteration (ms): 13652.1 | learning rate: 5.521E-06 | global batch size: 16 | lm loss: 7.260246E+00 | loss scale: 16384.0 | grad norm: 80928.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
34104
+ time (ms)
34105
+ [2021-09-24 07:03:47] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition)
34106
+ [2021-09-24 07:03:47] PULSE: tr8-104B is running for 1:11:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])