bigscience-bot
commited on
Commit
•
1271e23
1
Parent(s):
53fba9f
new data
Browse files- logs/main_log.txt +106 -0
logs/main_log.txt
CHANGED
@@ -33998,3 +33998,109 @@ time (ms)
|
|
33998 |
time (ms)
|
33999 |
iteration 1192/ 159576 | consumed samples: 19072 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.290E-06 | global batch size: 16 | lm loss: 7.350411E+00 | loss scale: 16384.0 | grad norm: 81228.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34000 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33998 |
time (ms)
|
33999 |
iteration 1192/ 159576 | consumed samples: 19072 | elapsed time per iteration (ms): 13665.3 | learning rate: 5.290E-06 | global batch size: 16 | lm loss: 7.350411E+00 | loss scale: 16384.0 | grad norm: 81228.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34000 |
time (ms)
|
34001 |
+
iteration 1193/ 159576 | consumed samples: 19088 | elapsed time per iteration (ms): 13585.9 | learning rate: 5.294E-06 | global batch size: 16 | lm loss: 7.583058E+00 | loss scale: 16384.0 | grad norm: 291080.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34002 |
+
time (ms)
|
34003 |
+
iteration 1194/ 159576 | consumed samples: 19104 | elapsed time per iteration (ms): 13658.0 | learning rate: 5.299E-06 | global batch size: 16 | lm loss: 7.808938E+00 | loss scale: 16384.0 | grad norm: 193632.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34004 |
+
time (ms)
|
34005 |
+
iteration 1195/ 159576 | consumed samples: 19120 | elapsed time per iteration (ms): 13777.0 | learning rate: 5.303E-06 | global batch size: 16 | lm loss: 7.459247E+00 | loss scale: 16384.0 | grad norm: 100738.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34006 |
+
time (ms)
|
34007 |
+
iteration 1196/ 159576 | consumed samples: 19136 | elapsed time per iteration (ms): 13624.3 | learning rate: 5.308E-06 | global batch size: 16 | lm loss: 7.240894E+00 | loss scale: 16384.0 | grad norm: 102223.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34008 |
+
time (ms)
|
34009 |
+
iteration 1197/ 159576 | consumed samples: 19152 | elapsed time per iteration (ms): 13630.2 | learning rate: 5.312E-06 | global batch size: 16 | lm loss: 7.469604E+00 | loss scale: 16384.0 | grad norm: 91547.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34010 |
+
time (ms)
|
34011 |
+
iteration 1198/ 159576 | consumed samples: 19168 | elapsed time per iteration (ms): 13603.4 | learning rate: 5.317E-06 | global batch size: 16 | lm loss: 7.399169E+00 | loss scale: 16384.0 | grad norm: 246196.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34012 |
+
time (ms)
|
34013 |
+
iteration 1199/ 159576 | consumed samples: 19184 | elapsed time per iteration (ms): 14028.5 | learning rate: 5.321E-06 | global batch size: 16 | lm loss: 7.465099E+00 | loss scale: 16384.0 | grad norm: 185665.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34014 |
+
time (ms)
|
34015 |
+
iteration 1200/ 159576 | consumed samples: 19200 | elapsed time per iteration (ms): 13601.1 | learning rate: 5.325E-06 | global batch size: 16 | lm loss: 7.383169E+00 | loss scale: 16384.0 | grad norm: 115872.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34016 |
+
time (ms)
|
34017 |
+
iteration 1201/ 159576 | consumed samples: 19216 | elapsed time per iteration (ms): 13566.6 | learning rate: 5.330E-06 | global batch size: 16 | lm loss: 7.352910E+00 | loss scale: 16384.0 | grad norm: 114834.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34018 |
+
time (ms)
|
34019 |
+
iteration 1202/ 159576 | consumed samples: 19232 | elapsed time per iteration (ms): 13557.4 | learning rate: 5.334E-06 | global batch size: 16 | lm loss: 7.521720E+00 | loss scale: 16384.0 | grad norm: 101976.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34020 |
+
time (ms)
|
34021 |
+
iteration 1203/ 159576 | consumed samples: 19248 | elapsed time per iteration (ms): 13525.0 | learning rate: 5.339E-06 | global batch size: 16 | lm loss: 7.225696E+00 | loss scale: 16384.0 | grad norm: 178745.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34022 |
+
time (ms)
|
34023 |
+
iteration 1204/ 159576 | consumed samples: 19264 | elapsed time per iteration (ms): 13539.3 | learning rate: 5.343E-06 | global batch size: 16 | lm loss: 7.375963E+00 | loss scale: 16384.0 | grad norm: 175723.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34024 |
+
time (ms)
|
34025 |
+
iteration 1205/ 159576 | consumed samples: 19280 | elapsed time per iteration (ms): 13532.3 | learning rate: 5.348E-06 | global batch size: 16 | lm loss: 7.402988E+00 | loss scale: 16384.0 | grad norm: 104645.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34026 |
+
time (ms)
|
34027 |
+
iteration 1206/ 159576 | consumed samples: 19296 | elapsed time per iteration (ms): 13502.9 | learning rate: 5.352E-06 | global batch size: 16 | lm loss: 7.302839E+00 | loss scale: 16384.0 | grad norm: 99328.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34028 |
+
time (ms)
|
34029 |
+
iteration 1207/ 159576 | consumed samples: 19312 | elapsed time per iteration (ms): 13540.4 | learning rate: 5.357E-06 | global batch size: 16 | lm loss: 7.555269E+00 | loss scale: 16384.0 | grad norm: 89166.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34030 |
+
time (ms)
|
34031 |
+
iteration 1208/ 159576 | consumed samples: 19328 | elapsed time per iteration (ms): 13900.0 | learning rate: 5.361E-06 | global batch size: 16 | lm loss: 7.459805E+00 | loss scale: 16384.0 | grad norm: 135152.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34032 |
+
time (ms)
|
34033 |
+
iteration 1209/ 159576 | consumed samples: 19344 | elapsed time per iteration (ms): 13560.6 | learning rate: 5.365E-06 | global batch size: 16 | lm loss: 7.419579E+00 | loss scale: 16384.0 | grad norm: 101249.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34034 |
+
time (ms)
|
34035 |
+
iteration 1210/ 159576 | consumed samples: 19360 | elapsed time per iteration (ms): 13658.8 | learning rate: 5.370E-06 | global batch size: 16 | lm loss: 7.348646E+00 | loss scale: 16384.0 | grad norm: 104483.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34036 |
+
time (ms)
|
34037 |
+
iteration 1211/ 159576 | consumed samples: 19376 | elapsed time per iteration (ms): 13533.6 | learning rate: 5.374E-06 | global batch size: 16 | lm loss: 7.494230E+00 | loss scale: 16384.0 | grad norm: 110210.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34038 |
+
time (ms)
|
34039 |
+
iteration 1212/ 159576 | consumed samples: 19392 | elapsed time per iteration (ms): 13905.0 | learning rate: 5.379E-06 | global batch size: 16 | lm loss: 7.390188E+00 | loss scale: 16384.0 | grad norm: 96645.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34040 |
+
time (ms)
|
34041 |
+
iteration 1213/ 159576 | consumed samples: 19408 | elapsed time per iteration (ms): 13673.2 | learning rate: 5.383E-06 | global batch size: 16 | lm loss: 7.318599E+00 | loss scale: 16384.0 | grad norm: 166216.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34042 |
+
time (ms)
|
34043 |
+
iteration 1214/ 159576 | consumed samples: 19424 | elapsed time per iteration (ms): 13582.9 | learning rate: 5.388E-06 | global batch size: 16 | lm loss: 7.262068E+00 | loss scale: 16384.0 | grad norm: 75724.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34044 |
+
time (ms)
|
34045 |
+
iteration 1215/ 159576 | consumed samples: 19440 | elapsed time per iteration (ms): 13570.1 | learning rate: 5.392E-06 | global batch size: 16 | lm loss: 7.594563E+00 | loss scale: 16384.0 | grad norm: 95306.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34046 |
+
time (ms)
|
34047 |
+
iteration 1216/ 159576 | consumed samples: 19456 | elapsed time per iteration (ms): 13639.7 | learning rate: 5.396E-06 | global batch size: 16 | lm loss: 7.375734E+00 | loss scale: 16384.0 | grad norm: 86152.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34048 |
+
time (ms)
|
34049 |
+
iteration 1217/ 159576 | consumed samples: 19472 | elapsed time per iteration (ms): 14091.6 | learning rate: 5.401E-06 | global batch size: 16 | lm loss: 7.213047E+00 | loss scale: 16384.0 | grad norm: 95583.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34050 |
+
time (ms)
|
34051 |
+
iteration 1218/ 159576 | consumed samples: 19488 | elapsed time per iteration (ms): 13516.3 | learning rate: 5.405E-06 | global batch size: 16 | lm loss: 7.437682E+00 | loss scale: 16384.0 | grad norm: 221549.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34052 |
+
time (ms)
|
34053 |
+
iteration 1219/ 159576 | consumed samples: 19504 | elapsed time per iteration (ms): 13610.0 | learning rate: 5.410E-06 | global batch size: 16 | lm loss: 7.254605E+00 | loss scale: 16384.0 | grad norm: 97554.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34054 |
+
time (ms)
|
34055 |
+
iteration 1220/ 159576 | consumed samples: 19520 | elapsed time per iteration (ms): 13565.5 | learning rate: 5.414E-06 | global batch size: 16 | lm loss: 7.248229E+00 | loss scale: 16384.0 | grad norm: 89138.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34056 |
+
time (ms)
|
34057 |
+
iteration 1221/ 159576 | consumed samples: 19536 | elapsed time per iteration (ms): 13989.3 | learning rate: 5.419E-06 | global batch size: 16 | lm loss: 7.313151E+00 | loss scale: 16384.0 | grad norm: 172651.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34058 |
+
time (ms)
|
34059 |
+
iteration 1222/ 159576 | consumed samples: 19552 | elapsed time per iteration (ms): 13602.4 | learning rate: 5.423E-06 | global batch size: 16 | lm loss: 7.476789E+00 | loss scale: 16384.0 | grad norm: 67387.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34060 |
+
time (ms)
|
34061 |
+
iteration 1223/ 159576 | consumed samples: 19568 | elapsed time per iteration (ms): 13656.0 | learning rate: 5.428E-06 | global batch size: 16 | lm loss: 7.289939E+00 | loss scale: 16384.0 | grad norm: 207125.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34062 |
+
time (ms)
|
34063 |
+
iteration 1224/ 159576 | consumed samples: 19584 | elapsed time per iteration (ms): 13537.8 | learning rate: 5.432E-06 | global batch size: 16 | lm loss: 7.409894E+00 | loss scale: 16384.0 | grad norm: 156218.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34064 |
+
time (ms)
|
34065 |
+
iteration 1225/ 159576 | consumed samples: 19600 | elapsed time per iteration (ms): 13600.0 | learning rate: 5.436E-06 | global batch size: 16 | lm loss: 7.226832E+00 | loss scale: 16384.0 | grad norm: 93258.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34066 |
+
time (ms)
|
34067 |
+
iteration 1226/ 159576 | consumed samples: 19616 | elapsed time per iteration (ms): 13778.7 | learning rate: 5.441E-06 | global batch size: 16 | lm loss: 7.406470E+00 | loss scale: 16384.0 | grad norm: 95037.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34068 |
+
time (ms)
|
34069 |
+
iteration 1227/ 159576 | consumed samples: 19632 | elapsed time per iteration (ms): 13609.5 | learning rate: 5.445E-06 | global batch size: 16 | lm loss: 7.385060E+00 | loss scale: 16384.0 | grad norm: 77831.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34070 |
+
time (ms)
|
34071 |
+
iteration 1228/ 159576 | consumed samples: 19648 | elapsed time per iteration (ms): 13561.8 | learning rate: 5.450E-06 | global batch size: 16 | lm loss: 7.283795E+00 | loss scale: 16384.0 | grad norm: 219813.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34072 |
+
time (ms)
|
34073 |
+
iteration 1229/ 159576 | consumed samples: 19664 | elapsed time per iteration (ms): 13619.4 | learning rate: 5.454E-06 | global batch size: 16 | lm loss: 7.344219E+00 | loss scale: 16384.0 | grad norm: 122192.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34074 |
+
time (ms)
|
34075 |
+
iteration 1230/ 159576 | consumed samples: 19680 | elapsed time per iteration (ms): 14054.6 | learning rate: 5.459E-06 | global batch size: 16 | lm loss: 7.364305E+00 | loss scale: 16384.0 | grad norm: 90944.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34076 |
+
time (ms)
|
34077 |
+
iteration 1231/ 159576 | consumed samples: 19696 | elapsed time per iteration (ms): 13589.9 | learning rate: 5.463E-06 | global batch size: 16 | lm loss: 7.421730E+00 | loss scale: 16384.0 | grad norm: 178816.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34078 |
+
time (ms)
|
34079 |
+
iteration 1232/ 159576 | consumed samples: 19712 | elapsed time per iteration (ms): 13624.6 | learning rate: 5.467E-06 | global batch size: 16 | lm loss: 7.278720E+00 | loss scale: 16384.0 | grad norm: 101190.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34080 |
+
time (ms)
|
34081 |
+
iteration 1233/ 159576 | consumed samples: 19728 | elapsed time per iteration (ms): 13574.7 | learning rate: 5.472E-06 | global batch size: 16 | lm loss: 7.525582E+00 | loss scale: 16384.0 | grad norm: 95476.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34082 |
+
time (ms)
|
34083 |
+
iteration 1234/ 159576 | consumed samples: 19744 | elapsed time per iteration (ms): 13981.0 | learning rate: 5.476E-06 | global batch size: 16 | lm loss: 7.294508E+00 | loss scale: 16384.0 | grad norm: 110379.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34084 |
+
time (ms)
|
34085 |
+
iteration 1235/ 159576 | consumed samples: 19760 | elapsed time per iteration (ms): 13641.1 | learning rate: 5.481E-06 | global batch size: 16 | lm loss: 7.431972E+00 | loss scale: 16384.0 | grad norm: 103188.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34086 |
+
time (ms)
|
34087 |
+
iteration 1236/ 159576 | consumed samples: 19776 | elapsed time per iteration (ms): 13575.4 | learning rate: 5.485E-06 | global batch size: 16 | lm loss: 7.397687E+00 | loss scale: 16384.0 | grad norm: 92125.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34088 |
+
time (ms)
|
34089 |
+
iteration 1237/ 159576 | consumed samples: 19792 | elapsed time per iteration (ms): 13672.0 | learning rate: 5.490E-06 | global batch size: 16 | lm loss: 7.314774E+00 | loss scale: 16384.0 | grad norm: 75870.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34090 |
+
time (ms)
|
34091 |
+
iteration 1238/ 159576 | consumed samples: 19808 | elapsed time per iteration (ms): 13509.4 | learning rate: 5.494E-06 | global batch size: 16 | lm loss: 7.187806E+00 | loss scale: 16384.0 | grad norm: 173296.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34092 |
+
time (ms)
|
34093 |
+
iteration 1239/ 159576 | consumed samples: 19824 | elapsed time per iteration (ms): 13875.3 | learning rate: 5.499E-06 | global batch size: 16 | lm loss: 7.376097E+00 | loss scale: 16384.0 | grad norm: 133632.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34094 |
+
time (ms)
|
34095 |
+
iteration 1240/ 159576 | consumed samples: 19840 | elapsed time per iteration (ms): 13610.1 | learning rate: 5.503E-06 | global batch size: 16 | lm loss: 7.267582E+00 | loss scale: 16384.0 | grad norm: 85104.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34096 |
+
time (ms)
|
34097 |
+
iteration 1241/ 159576 | consumed samples: 19856 | elapsed time per iteration (ms): 13551.5 | learning rate: 5.507E-06 | global batch size: 16 | lm loss: 7.352735E+00 | loss scale: 16384.0 | grad norm: 90699.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34098 |
+
time (ms)
|
34099 |
+
iteration 1242/ 159576 | consumed samples: 19872 | elapsed time per iteration (ms): 13593.9 | learning rate: 5.512E-06 | global batch size: 16 | lm loss: 7.468503E+00 | loss scale: 16384.0 | grad norm: 83188.176 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34100 |
+
time (ms)
|
34101 |
+
iteration 1243/ 159576 | consumed samples: 19888 | elapsed time per iteration (ms): 13930.9 | learning rate: 5.516E-06 | global batch size: 16 | lm loss: 7.214951E+00 | loss scale: 16384.0 | grad norm: 78366.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34102 |
+
time (ms)
|
34103 |
+
iteration 1244/ 159576 | consumed samples: 19904 | elapsed time per iteration (ms): 13652.1 | learning rate: 5.521E-06 | global batch size: 16 | lm loss: 7.260246E+00 | loss scale: 16384.0 | grad norm: 80928.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
34104 |
+
time (ms)
|
34105 |
+
[2021-09-24 07:03:47] PULSE: tr8-104B is waiting for the previous Job Array job to finish before scheduling a new one (1162855_[2-10%1] on 'gpu_p13' partition)
|
34106 |
+
[2021-09-24 07:03:47] PULSE: tr8-104B is running for 1:11:36 since 2021-09-24T05:52:11 (1162855_1 on 'gpu_p13' partition (r6i4n[5,7],r6i5n[2,7-8],r6i6n[0,2,6],r7i2n[4-5],r7i6n[2-4],r7i7n[7-8],r8i0n[2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-2],r9i1n[0-3],r9i2n[3-5,8],r9i3n[0-1,7-8],r9i4n[0-2],r9i5n[3-8],r9i6n[0,7-8])
|