File size: 22,958 Bytes
1e2d4fe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 |
05/07/2023 04:42:05 WARNING Found cached dataset parquet (/home/pszemraj/.cache/huggingface/datasets/OpenAssistant___parquet/OpenAssistant--oasst1-2960c57d7e52ab15/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
05/07/2023 04:42:06 WARNING No such comm: c8c073cce7994da5b454ed0300090049
05/07/2023 04:42:06 WARNING No such comm: 1103c6a0950249ca863ebc8399fddfef
05/07/2023 04:42:06 WARNING No such comm: 5c3ce017525f4406904695297ace8724
05/07/2023 04:42:06 WARNING No such comm: c5ceaf44ed3942cdb730705e230f024b
05/07/2023 04:42:06 WARNING No such comm: f953c7265b2248c98cc4dbe971b44f3d
05/07/2023 04:42:06 WARNING No such comm: 687a131767524803a41093a1d84f4652
05/07/2023 04:42:06 WARNING No such comm: 93293aa5cce946bc8c6aa6ee4d0eaeb1
05/07/2023 04:42:06 WARNING No such comm: 637d46ef1d57406a817ef020d0c7bf06
05/07/2023 04:42:06 WARNING No such comm: 494913a72a3b4802b2390b58f38a3a36
05/07/2023 04:42:06 WARNING No such comm: 2678191b17564118a9e16b1201d9b4d2
05/07/2023 04:42:06 WARNING No such comm: 891bcbcf176840789f36c723e386c9b9
05/07/2023 04:42:06 INFO Quantized model will be saved to: /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g
05/07/2023 04:42:14 INFO Running quantization..
05/07/2023 04:42:16 INFO Start quantizing layer 1/16
05/07/2023 04:42:49 INFO Quantizing attention.query_key_value in layer 1/16...
05/07/2023 04:42:50 INFO duration: 1.0365328788757324
05/07/2023 04:42:50 INFO avg loss: 0.2228083991395018
05/07/2023 04:43:23 INFO Quantizing attention.dense in layer 1/16...
05/07/2023 04:43:24 INFO duration: 0.7084124088287354
05/07/2023 04:43:24 INFO avg loss: 0.01904001936744958
05/07/2023 04:43:57 INFO Quantizing mlp.dense_h_to_4h in layer 1/16...
05/07/2023 04:43:58 INFO duration: 1.0652313232421875
05/07/2023 04:43:58 INFO avg loss: 0.304011920770505/07/2023 04:47:44 INFO Quantizing mlp.dense_4h_to_h in layer 1/16...
05/07/2023 04:47:51 INFO duration: 6.762867212295532
05/07/2023 04:47:51 INFO avg loss: 0.028748639221516405
05/07/2023 04:48:12 INFO Start quantizing layer 2/16
05/07/2023 04:48:45 INFO Quantizing attention.query_key_value in layer 2/16...
05/07/2023 04:48:46 INFO duration: 0.9713742733001709
05/07/2023 04:48:46 INFO avg loss: 0.35355199259310105
05/07/2023 04:49:19 INFO Quantizing attention.dense in layer 2/16...
05/07/2023 04:49:20 INFO duration: 0.7275807857513428
05/07/2023 04:49:20 INFO avg loss: 0.06647738861961487
05/07/2023 04:49:53 INFO Quantizing mlp.dense_h_to_4h in layer 2/16...
05/07/2023 04:49:54 INFO duration: 1.083951711654663
05/07/2023 04:49:54 INFO avg loss: 0.6772610437882721
05/07/2023 04:53:40 INFO Quantizing mlp.dense_4h_to_h in layer 2/16...
05/07/2023 04:53:47 INFO duration: 6.844736814498901
05/07/2023 04:53:47 INFO avg loss: 0.05320497620473908
05/07/2023 04:54:08 INFO Start quantizing layer 3/16
05/07/2023 04:54:41 INFO Quantizing attention.query_key_value in layer 3/16...
05/07/2023 04:54:42 INFO duration: 0.9685044288635254
05/07/2023 04:54:42 INFO avg loss: 0.6015139448756989
05/07/2023 04:55:15 INFO Quantizing attention.dense in layer 3/16...
05/07/2023 04:55:16 INFO duration: 0.7167198657989502
05/07/2023 04:55:16 INFO avg loss: 0.06039099241344058
05/07/2023 04:55:49 INFO Quantizing mlp.dense_h_to_4h in layer 3/16...
05/07/2023 04:55:50 INFO duration: 1.0765190124511719
05/07/2023 04:55:50 INFO avg loss: 1.3903707193490416
05/07/2023 04:59:37 INFO Quantizing mlp.dense_4h_to_h in layer 3/16...
05/07/2023 04:59:43 INFO duration: 6.270395040512085
05/07/2023 04:59:43 INFO avg loss: 0.181059166011465
05/07/2023 05:00:04 INFO Start quantizing layer 4/16
05/07/2023 05:00:37 INFO Quantizing attention.query_key_value in layer 4/16...
05/07/2023 05:00:38 INFO duration: 0.9672496318817139
05/07/2023 05:00:38 INFO avg loss: 0.9807066506090255
05/07/2023 05:01:11 INFO Quantizing attention.dense in layer 4/16...
05/07/2023 05:01:12 INFO duration: 0.7248861789703369
05/07/2023 05:01:12 INFO avg loss: 0.1315788618418863
05/07/2023 05:01:45 INFO Quantizing mlp.dense_h_to_4h in layer 4/16...
05/07/2023 05:01:46 INFO duration: 1.083066463470459
05/07/2023 05:01:46 INFO avg loss: 2.080002984807641
05/07/2023 05:05:32 INFO Quantizing mlp.dense_4h_to_h in layer 4/16...
05/07/2023 05:05:38 INFO duration: 6.18793797492981
05/07/2023 05:05:38 INFO avg loss: 0.252437506240016
05/07/2023 05:05:59 INFO Start quantizing layer 5/16
05/07/2023 05:06:32 INFO Quantizing attention.query_key_value in layer 5/16...
05/07/2023 05:06:33 INFO duration: 0.9693779945373535
05/07/2023 05:06:33 INFO avg loss: 1.3782398682940629
05/07/2023 05:07:06 INFO Quantizing attention.dense in layer 5/16...
05/07/2023 05:07:07 INFO duration: 0.7210879325866699
05/07/2023 05:07:07 INFO avg loss: 0.14899523392779884
05/07/2023 05:07:40 INFO Quantizing mlp.dense_h_to_4h in layer 5/16...
05/07/2023 05:07:41 INFO duration: 1.0800914764404297
05/07/2023 05:07:41 INFO avg loss: 2.332041130025293
05/07/2023 05:11:27 INFO Quantizing mlp.dense_4h_to_h in layer 5/16...
05/07/2023 05:11:33 INFO duration: 6.191901206970215
05/07/2023 05:11:33 INFO avg loss: 0.3255492384060503
05/07/2023 05:11:54 INFO Start quantizing layer 6/16
05/07/2023 05:12:27 INFO Quantizing attention.query_key_value in layer 6/16...
05/07/2023 05:12:28 INFO duration: 0.9662725925445557
05/07/2023 05:12:28 INFO avg loss: 1.757845780085197
05/07/2023 05:13:01 INFO Quantizing attention.dense in layer 6/16...
05/07/2023 05:13:02 INFO duration: 0.7185342311859131
05/07/2023 05:13:02 INFO avg loss: 0.15947506450616514
05/07/2023 05:13:35 INFO Quantizing mlp.dense_h_to_4h in layer 6/16...
05/07/2023 05:13:36 INFO duration: 1.075429916381836
05/07/2023 05:13:36 INFO avg loss: 2.4491654498635516
05/07/2023 05:17:18 INFO Quantizing mlp.dense_4h_to_h in layer 6/16...
05/07/2023 05:17:24 INFO duration: 5.919256925582886
05/07/2023 05:17:24 INFO avg loss: 0.40534172017480363
05/07/2023 05:17:45 INFO Start quantizing layer 7/16
05/07/2023 05:18:18 INFO Quantizing attention.query_key_value in layer 7/16...
05/07/2023 05:18:19 INFO duration: 0.9676733016967773
05/07/2023 05:18:19 INFO avg loss: 2.131913417698349
05/07/2023 05:18:52 INFO Quantizing attention.dense in layer 7/16...
05/07/2023 05:18:53 INFO duration: 0.7196581363677979
05/07/2023 05:18:53 INFO avg loss: 0.20212076367915502
05/07/2023 05:19:26 INFO Quantizing mlp.dense_h_to_4h in layer 7/16...
05/07/2023 05:19:27 INFO duration: 1.0817346572875977
05/07/2023 05:19:27 INFO avg loss: 2.4321377462726304
05/07/2023 05:23:08 INFO Quantizing mlp.dense_4h_to_h in layer 7/16...
05/07/2023 05:23:14 INFO duration: 5.973307132720947
05/07/2023 05:23:14 INFO avg loss: 0.4796293378511049
05/07/2023 05:23:35 INFO Start quantizing layer 8/16
05/07/2023 05:24:08 INFO Quantizing attention.query_key_value in layer 8/16...
05/07/2023 05:24:09 INFO duration: 0.9668700695037842
05/07/2023 05:24:09 INFO avg loss: 2.3333008332501333
05/07/2023 05:24:42 INFO Quantizing attention.dense in layer 8/16...
05/07/2023 05:24:43 INFO duration: 0.7205338478088379
05/07/2023 05:24:43 INFO avg loss: 0.2906766491322218
05/07/2023 05:25:16 INFO Quantizing mlp.dense_h_to_4h in layer 8/16...
05/07/2023 05:25:17 INFO duration: 1.075392246246338
05/07/2023 05:25:17 INFO avg loss: 2.088160245690229
05/07/2023 05:28:59 INFO Quantizing mlp.dense_4h_to_h in layer 8/16...
05/07/2023 05:29:05 INFO duration: 6.0966198444366455
05/07/2023 05:29:05 INFO avg loss: 0.4126856014751398
05/07/2023 05:29:26 INFO Start quantizing layer 9/16
05/07/2023 05:29:59 INFO Quantizing attention.query_key_value in layer 9/16...
05/07/2023 05:30:00 INFO duration: 0.971062183380127
05/07/2023 05:30:00 INFO avg loss: 4.631909777689031
05/07/2023 05:30:33 INFO Quantizing attention.dense in layer 9/16...
05/07/2023 05:30:34 INFO duration: 0.7198226451873779
05/07/2023 05:30:34 INFO avg loss: 0.2723473172091321
05/07/2023 05:31:07 INFO Quantizing mlp.dense_h_to_4h in layer 9/16...
05/07/2023 05:31:08 INFO duration: 1.0791394710540771
05/07/2023 05:31:08 INFO avg loss: 2.0461749482078675
05/07/2023 05:34:49 INFO Quantizing mlp.dense_4h_to_h in layer 9/16...
05/07/2023 05:34:55 INFO duration: 5.983144044876099
05/07/2023 05:34:55 INFO avg loss: 0.5113805541342186
05/07/2023 05:35:16 INFO Start quantizing layer 10/16
05/07/2023 05:35:49 INFO Quantizing attention.query_key_value in layer 10/16...
05/07/2023 05:35:50 INFO duration: 0.9664998054504395
05/07/2023 05:35:50 INFO avg loss: 7.197037864416933
05/07/2023 05:36:23 INFO Quantizing attention.dense in layer 10/16...
05/07/2023 05:36:24 INFO duration: 0.7181813716888428
05/07/2023 05:36:24 INFO avg loss: 0.3427228673705405
05/07/2023 05:36:57 INFO Quantizing mlp.dense_h_to_4h in layer 10/16...
05/07/2023 05:36:58 INFO duration: 1.0781819820404053
05/07/2023 05:36:58 INFO avg loss: 2.320328880041933
05/07/2023 05:40:40 INFO Quantizing mlp.dense_4h_to_h in layer 10/16...
05/07/2023 05:40:46 INFO duration: 6.027331829071045
05/07/2023 05:40:46 INFO avg loss: 0.6135274056301584
05/07/2023 05:41:07 INFO Start quantizing layer 11/16
05/07/2023 05:41:40 INFO Quantizing attention.query_key_value in layer 11/16...
05/07/2023 05:41:41 INFO duration: 0.9669804573059082
05/07/2023 05:41:41 INFO avg loss: 7.502283845846645
05/07/2023 05:42:14 INFO Quantizing attention.dense in layer 11/16...
05/07/2023 05:42:14 INFO duration: 0.7167062759399414
05/07/2023 05:42:14 INFO avg loss: 0.2933824760591387
05/07/2023 05:42:47 INFO Quantizing mlp.dense_h_to_4h in layer 11/16...
05/07/2023 05:42:48 INFO duration: 1.077958345413208
05/07/2023 05:42:48 INFO avg loss: 2.6354988268769968
05/07/2023 05:46:30 INFO Quantizing mlp.dense_4h_to_h in layer 11/16...
05/07/2023 05:46:36 INFO duration: 5.968295335769653
05/07/2023 05:46:36 INFO avg loss: 0.7737983809238551
05/07/2023 05:46:57 INFO Start quantizing layer 12/16
05/07/2023 05:47:30 INFO Quantizing attention.query_key_value in layer 12/16...
05/07/2023 05:47:31 INFO duration: 0.9708924293518066
05/07/2023 05:47:31 INFO avg loss: 6.875169520433972
05/07/2023 05:48:04 INFO Quantizing attention.dense in layer 12/16...
05/07/2023 05:48:05 INFO duration: 0.7233545780181885
05/07/2023 05:48:05 INFO avg loss: 0.36776245897189497
05/07/2023 05:48:38 INFO Quantizing mlp.dense_h_to_4h in layer 12/16...
05/07/2023 05:48:39 INFO duration: 1.078718900680542
05/07/2023 05:48:39 INFO avg loss: 2.9615547415801386
05/07/2023 05:52:21 INFO Quantizing mlp.dense_4h_to_h in layer 12/16...
05/07/2023 05:52:27 INFO duration: 6.078177452087402
05/07/2023 05:52:27 INFO avg loss: 0.9158687896241015
05/07/2023 05:52:48 INFO Start quantizing layer 13/16
05/07/2023 05:53:21 INFO Quantizing attention.query_key_value in layer 13/16...
05/07/2023 05:53:22 INFO duration: 0.9698812961578369
05/07/2023 05:53:22 INFO avg loss: 5.93688639842918
05/07/2023 05:53:54 INFO Quantizing attention.dense in layer 13/16...
05/07/2023 05:53:55 INFO duration: 0.7205860614776611
05/07/2023 05:53:55 INFO avg loss: 0.24467934637912672
05/07/2023 05:54:28 INFO Quantizing mlp.dense_h_to_4h in layer 13/16...
05/07/2023 05:54:29 INFO duration: 1.0801022052764893
05/07/2023 05:54:29 INFO avg loss: 3.275802466054313
05/07/2023 05:58:11 INFO Quantizing mlp.dense_4h_to_h in layer 13/16...
05/07/2023 05:58:17 INFO duration: 6.09338641166687
05/07/2023 05:58:17 INFO avg loss: 1.0767965265991082
05/07/2023 05:58:38 INFO Start quantizing layer 14/16
05/07/2023 05:59:11 INFO Quantizing attention.query_key_value in layer 14/16...
05/07/2023 05:59:12 INFO duration: 0.9676227569580078
05/07/2023 05:59:12 INFO avg loss: 6.686944638578275
05/07/2023 05:59:45 INFO Quantizing attention.dense in layer 14/16...
05/07/2023 05:59:46 INFO duration: 0.7196416854858398
05/07/2023 05:59:46 INFO avg loss: 0.34242789661541534
05/07/2023 06:00:19 INFO Quantizing mlp.dense_h_to_4h in layer 14/16...
05/07/2023 06:00:20 INFO duration: 1.0829389095306396
05/07/2023 06:00:20 INFO avg loss: 3.705307965588392
05/07/2023 06:04:02 INFO Quantizing mlp.dense_4h_to_h in layer 14/16...
05/07/2023 06:04:08 INFO duration: 6.013010263442993
05/07/2023 06:04:08 INFO avg loss: 1.1975950458433173
05/07/2023 06:04:29 INFO Start quantizing layer 15/16
05/07/2023 06:05:02 INFO Quantizing attention.query_key_value in layer 15/16...
05/07/2023 06:05:03 INFO duration: 0.9704198837280273
05/07/2023 06:05:03 INFO avg loss: 7.567932973908413
05/07/2023 06:05:36 INFO Quantizing attention.dense in layer 15/16...
05/07/2023 06:05:37 INFO duration: 0.7222294807434082
05/07/2023 06:05:37 INFO avg loss: 0.4468821890184039
05/07/2023 06:06:10 INFO Quantizing mlp.dense_h_to_4h in layer 15/16...
05/07/2023 06:06:11 INFO duration: 1.0775363445281982
05/07/2023 06:06:11 INFO avg loss: 4.276716368393903
05/07/2023 06:09:52 INFO Quantizing mlp.dense_4h_to_h in layer 15/16...
05/07/2023 06:09:58 INFO duration: 6.097189664840698
05/07/2023 06:09:58 INFO avg loss: 1.6799194205937167
05/07/2023 06:10:19 INFO Start quantizing layer 16/16
05/07/2023 06:10:52 INFO Quantizing attention.query_key_value in layer 16/16...
05/07/2023 06:10:53 INFO duration: 0.9705617427825928
05/07/2023 06:10:53 INFO avg loss: 7.100380016972843
05/07/2023 06:11:26 INFO Quantizing attention.dense in layer 16/16...
05/07/2023 06:11:27 INFO duration: 0.722510814666748
05/07/2023 06:11:27 INFO avg loss: 0.24434113426330373
05/07/2023 06:12:00 INFO Quantizing mlp.dense_h_to_4h in layer 16/16...
05/07/2023 06:12:01 INFO duration: 1.0826246738433838
05/07/2023 06:12:01 INFO avg loss: 4.788446298422524
05/07/2023 06:15:43 INFO Quantizing mlp.dense_4h_to_h in layer 16/16...
05/07/2023 06:15:49 INFO duration: 6.170569658279419
05/07/2023 06:15:49 INFO avg loss: 1.7897084716536875
05/07/2023 06:16:11 INFO Packing model...
05/07/2023 06:16:11 INFO gpt_neox.layers.0.attention.dense
05/07/2023 06:16:12 INFO gpt_neox.layers.0.attention.query_key_value
05/07/2023 06:16:15 INFO gpt_neox.layers.0.mlp.dense_4h_to_h
05/07/2023 06:16:18 INFO gpt_neox.layers.0.mlp.dense_h_to_4h
05/07/2023 06:16:22 INFO gpt_neox.layers.1.attention.dense
05/07/2023 06:16:23 INFO gpt_neox.layers.1.attention.query_key_value
05/07/2023 06:16:26 INFO gpt_neox.layers.1.mlp.dense_4h_to_h
05/07/2023 06:16:29 INFO gpt_neox.layers.1.mlp.dense_h_to_4h
05/07/2023 06:16:33 INFO gpt_neox.layers.2.attention.dense
05/07/2023 06:16:34 INFO gpt_neox.layers.2.attention.query_key_value
05/07/2023 06:16:37 INFO gpt_neox.layers.2.mlp.dense_4h_to_h
05/07/2023 06:16:40 INFO gpt_neox.layers.2.mlp.dense_h_to_4h
05/07/2023 06:16:44 INFO gpt_neox.layers.3.attention.dense
05/07/2023 06:16:45 INFO gpt_neox.layers.3.attention.query_key_value
05/07/2023 06:16:48 INFO gpt_neox.layers.3.mlp.dense_4h_to_h
05/07/2023 06:16:51 INFO gpt_neox.layers.3.mlp.dense_h_to_4h
05/07/2023 06:16:56 INFO gpt_neox.layers.4.attention.dense
05/07/2023 06:16:56 INFO gpt_neox.layers.4.attention.query_key_value
05/07/2023 06:16:59 INFO gpt_neox.layers.4.mlp.dense_4h_to_h
05/07/2023 06:17:03 INFO gpt_neox.layers.4.mlp.dense_h_to_4h
05/07/2023 06:17:07 INFO gpt_neox.layers.5.attention.dense
05/07/2023 06:17:08 INFO gpt_neox.layers.5.attention.query_key_value
05/07/2023 06:17:11 INFO gpt_neox.layers.5.mlp.dense_4h_to_h
05/07/2023 06:17:14 INFO gpt_neox.layers.5.mlp.dense_h_to_4h
05/07/2023 06:17:18 INFO gpt_neox.layers.6.attention.dense
05/07/2023 06:17:19 INFO gpt_neox.layers.6.attention.query_key_value
05/07/2023 06:17:22 INFO gpt_neox.layers.6.mlp.dense_4h_to_h
05/07/2023 06:17:25 INFO gpt_neox.layers.6.mlp.dense_h_to_4h
05/07/2023 06:17:29 INFO gpt_neox.layers.7.attention.dense
05/07/2023 06:17:30 INFO gpt_neox.layers.7.attention.query_key_value
05/07/2023 06:17:33 INFO gpt_neox.layers.7.mlp.dense_4h_to_h
05/07/2023 06:17:36 INFO gpt_neox.layers.7.mlp.dense_h_to_4h
05/07/2023 06:17:40 INFO gpt_neox.layers.8.attention.dense
05/07/2023 06:17:41 INFO gpt_neox.layers.8.attention.query_key_value
05/07/2023 06:17:44 INFO gpt_neox.layers.8.mlp.dense_4h_to_h
05/07/2023 06:17:47 INFO gpt_neox.layers.8.mlp.dense_h_to_4h
05/07/2023 06:17:51 INFO gpt_neox.layers.9.attention.dense
05/07/2023 06:17:52 INFO gpt_neox.layers.9.attention.query_key_value
05/07/2023 06:17:55 INFO gpt_neox.layers.9.mlp.dense_4h_to_h
05/07/2023 06:17:58 INFO gpt_neox.layers.9.mlp.dense_h_to_4h
05/07/2023 06:18:02 INFO gpt_neox.layers.10.attention.dense
05/07/2023 06:18:03 INFO gpt_neox.layers.10.attention.query_key_value
05/07/2023 06:18:06 INFO gpt_neox.layers.10.mlp.dense_4h_to_h
05/07/2023 06:18:09 INFO gpt_neox.layers.10.mlp.dense_h_to_4h
05/07/2023 06:18:13 INFO gpt_neox.layers.11.attention.dense
05/07/2023 06:18:14 INFO gpt_neox.layers.11.attention.query_key_value
05/07/2023 06:18:17 INFO gpt_neox.layers.11.mlp.dense_4h_to_h
05/07/2023 06:18:20 INFO gpt_neox.layers.11.mlp.dense_h_to_4h
05/07/2023 06:18:24 INFO gpt_neox.layers.12.attention.dense
05/07/2023 06:18:25 INFO gpt_neox.layers.12.attention.query_key_value
05/07/2023 06:18:28 INFO gpt_neox.layers.12.mlp.dense_4h_to_h
05/07/2023 06:18:31 INFO gpt_neox.layers.12.mlp.dense_h_to_4h
05/07/2023 06:18:35 INFO gpt_neox.layers.13.attention.dense
05/07/2023 06:18:36 INFO gpt_neox.layers.13.attention.query_key_value
05/07/2023 06:18:39 INFO gpt_neox.layers.13.mlp.dense_4h_to_h
05/07/2023 06:18:42 INFO gpt_neox.layers.13.mlp.dense_h_to_4h
05/07/2023 06:18:46 INFO gpt_neox.layers.14.attention.dense
05/07/2023 06:18:47 INFO gpt_neox.layers.14.attention.query_key_value
05/07/2023 06:18:50 INFO gpt_neox.layers.14.mlp.dense_4h_to_h
05/07/2023 06:18:53 INFO gpt_neox.layers.14.mlp.dense_h_to_4h
05/07/2023 06:18:57 INFO gpt_neox.layers.15.attention.dense
05/07/2023 06:18:58 INFO gpt_neox.layers.15.attention.query_key_value
05/07/2023 06:19:01 INFO gpt_neox.layers.15.mlp.dense_4h_to_h
05/07/2023 06:19:04 INFO gpt_neox.layers.15.mlp.dense_h_to_4h
05/07/2023 06:19:08 INFO Model packed.
05/07/2023 06:19:08 WARNING using autotune_warmup will move model to GPU, make sure you have enough VRAM to load the whole model.
05/07/2023 06:19:09 INFO Found 4 unique KN Linear values.
05/07/2023 06:19:09 INFO Warming up autotune cache ...
05/07/2023 06:19:58 INFO Done! Saving..
05/07/2023 06:20:05 INFO Saved. Size of the model file(s): 10063.64 MB
05/07/2023 06:20:05 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:20:05 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:20:06 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:20:06 INFO Found 4 unique KN Linear values.
05/07/2023 06:20:06 INFO Warming up autotune cache ...
05/07/2023 06:20:07 INFO Sample output: ('Because woodchucks (or squirrels, as they\'re also known) are "the chink[e] '
'of wood."')
05/07/2023 06:20:07 INFO GPU memory usage during test inference: 4.61 GB
05/07/2023 06:20:08 WARNING No such comm: d349e6339e5442e4a3286af931f0699f
05/07/2023 06:20:08 WARNING No such comm: 9374387013794a8bab6ba19cace86d58
05/07/2023 06:20:08 WARNING No such comm: bf152b67bcc04b93863ac311ea4df76a
05/07/2023 06:20:08 WARNING No such comm: 118ccfc8fe874373ae03f8132fb8c258
05/07/2023 06:20:08 WARNING No such comm: 9d85d31e378c44ce9119ead8b83e7556
05/07/2023 06:20:08 WARNING No such comm: c8c5130cae894895a66be12fe834c673
05/07/2023 06:20:08 WARNING No such comm: 237ea212dbd74befad2f34ba2161307d
05/07/2023 06:20:08 WARNING No such comm: 86eda75ae855461b8f5c1ae5b3a83cec
05/07/2023 06:20:08 WARNING No such comm: 63731c4e51f0433fbf85712c08c3d4bf
05/07/2023 06:20:08 WARNING No such comm: 2079a099466341488fc017f30e9359a8
05/07/2023 06:20:08 WARNING No such comm: 99fa75439d3d47c0a6a5b7c25c526718
05/07/2023 06:20:09 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:20:09 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:20:09 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:20:10 INFO Found 4 unique KN Linear values.
05/07/2023 06:20:10 INFO Warming up autotune cache ...
05/07/2023 06:31:04 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:31:04 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:31:04 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:31:05 INFO Found 4 unique KN Linear values.
05/07/2023 06:31:05 INFO Warming up autotune cache ...
05/07/2023 06:31:46 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:31:46 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:31:46 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:31:46 INFO Found 4 unique KN Linear values.
05/07/2023 06:31:46 INFO Warming up autotune cache ...
05/07/2023 06:32:16 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:32:16 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:32:16 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:32:16 INFO Found 4 unique KN Linear values.
05/07/2023 06:32:16 INFO Warming up autotune cache ...
05/07/2023 06:32:42 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:32:42 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:32:42 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:32:42 INFO Found 4 unique KN Linear values.
05/07/2023 06:32:42 INFO Warming up autotune cache ...
|