pszemraj's picture
add quantized version
1e2d4fe
05/07/2023 04:42:05 WARNING Found cached dataset parquet (/home/pszemraj/.cache/huggingface/datasets/OpenAssistant___parquet/OpenAssistant--oasst1-2960c57d7e52ab15/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
05/07/2023 04:42:06 WARNING No such comm: c8c073cce7994da5b454ed0300090049
05/07/2023 04:42:06 WARNING No such comm: 1103c6a0950249ca863ebc8399fddfef
05/07/2023 04:42:06 WARNING No such comm: 5c3ce017525f4406904695297ace8724
05/07/2023 04:42:06 WARNING No such comm: c5ceaf44ed3942cdb730705e230f024b
05/07/2023 04:42:06 WARNING No such comm: f953c7265b2248c98cc4dbe971b44f3d
05/07/2023 04:42:06 WARNING No such comm: 687a131767524803a41093a1d84f4652
05/07/2023 04:42:06 WARNING No such comm: 93293aa5cce946bc8c6aa6ee4d0eaeb1
05/07/2023 04:42:06 WARNING No such comm: 637d46ef1d57406a817ef020d0c7bf06
05/07/2023 04:42:06 WARNING No such comm: 494913a72a3b4802b2390b58f38a3a36
05/07/2023 04:42:06 WARNING No such comm: 2678191b17564118a9e16b1201d9b4d2
05/07/2023 04:42:06 WARNING No such comm: 891bcbcf176840789f36c723e386c9b9
05/07/2023 04:42:06 INFO Quantized model will be saved to: /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g
05/07/2023 04:42:14 INFO Running quantization..
05/07/2023 04:42:16 INFO Start quantizing layer 1/16
05/07/2023 04:42:49 INFO Quantizing attention.query_key_value in layer 1/16...
05/07/2023 04:42:50 INFO duration: 1.0365328788757324
05/07/2023 04:42:50 INFO avg loss: 0.2228083991395018
05/07/2023 04:43:23 INFO Quantizing attention.dense in layer 1/16...
05/07/2023 04:43:24 INFO duration: 0.7084124088287354
05/07/2023 04:43:24 INFO avg loss: 0.01904001936744958
05/07/2023 04:43:57 INFO Quantizing mlp.dense_h_to_4h in layer 1/16...
05/07/2023 04:43:58 INFO duration: 1.0652313232421875
05/07/2023 04:43:58 INFO avg loss: 0.304011920770505/07/2023 04:47:44 INFO Quantizing mlp.dense_4h_to_h in layer 1/16...
05/07/2023 04:47:51 INFO duration: 6.762867212295532
05/07/2023 04:47:51 INFO avg loss: 0.028748639221516405
05/07/2023 04:48:12 INFO Start quantizing layer 2/16
05/07/2023 04:48:45 INFO Quantizing attention.query_key_value in layer 2/16...
05/07/2023 04:48:46 INFO duration: 0.9713742733001709
05/07/2023 04:48:46 INFO avg loss: 0.35355199259310105
05/07/2023 04:49:19 INFO Quantizing attention.dense in layer 2/16...
05/07/2023 04:49:20 INFO duration: 0.7275807857513428
05/07/2023 04:49:20 INFO avg loss: 0.06647738861961487
05/07/2023 04:49:53 INFO Quantizing mlp.dense_h_to_4h in layer 2/16...
05/07/2023 04:49:54 INFO duration: 1.083951711654663
05/07/2023 04:49:54 INFO avg loss: 0.6772610437882721
05/07/2023 04:53:40 INFO Quantizing mlp.dense_4h_to_h in layer 2/16...
05/07/2023 04:53:47 INFO duration: 6.844736814498901
05/07/2023 04:53:47 INFO avg loss: 0.05320497620473908
05/07/2023 04:54:08 INFO Start quantizing layer 3/16
05/07/2023 04:54:41 INFO Quantizing attention.query_key_value in layer 3/16...
05/07/2023 04:54:42 INFO duration: 0.9685044288635254
05/07/2023 04:54:42 INFO avg loss: 0.6015139448756989
05/07/2023 04:55:15 INFO Quantizing attention.dense in layer 3/16...
05/07/2023 04:55:16 INFO duration: 0.7167198657989502
05/07/2023 04:55:16 INFO avg loss: 0.06039099241344058
05/07/2023 04:55:49 INFO Quantizing mlp.dense_h_to_4h in layer 3/16...
05/07/2023 04:55:50 INFO duration: 1.0765190124511719
05/07/2023 04:55:50 INFO avg loss: 1.3903707193490416
05/07/2023 04:59:37 INFO Quantizing mlp.dense_4h_to_h in layer 3/16...
05/07/2023 04:59:43 INFO duration: 6.270395040512085
05/07/2023 04:59:43 INFO avg loss: 0.181059166011465
05/07/2023 05:00:04 INFO Start quantizing layer 4/16
05/07/2023 05:00:37 INFO Quantizing attention.query_key_value in layer 4/16...
05/07/2023 05:00:38 INFO duration: 0.9672496318817139
05/07/2023 05:00:38 INFO avg loss: 0.9807066506090255
05/07/2023 05:01:11 INFO Quantizing attention.dense in layer 4/16...
05/07/2023 05:01:12 INFO duration: 0.7248861789703369
05/07/2023 05:01:12 INFO avg loss: 0.1315788618418863
05/07/2023 05:01:45 INFO Quantizing mlp.dense_h_to_4h in layer 4/16...
05/07/2023 05:01:46 INFO duration: 1.083066463470459
05/07/2023 05:01:46 INFO avg loss: 2.080002984807641
05/07/2023 05:05:32 INFO Quantizing mlp.dense_4h_to_h in layer 4/16...
05/07/2023 05:05:38 INFO duration: 6.18793797492981
05/07/2023 05:05:38 INFO avg loss: 0.252437506240016
05/07/2023 05:05:59 INFO Start quantizing layer 5/16
05/07/2023 05:06:32 INFO Quantizing attention.query_key_value in layer 5/16...
05/07/2023 05:06:33 INFO duration: 0.9693779945373535
05/07/2023 05:06:33 INFO avg loss: 1.3782398682940629
05/07/2023 05:07:06 INFO Quantizing attention.dense in layer 5/16...
05/07/2023 05:07:07 INFO duration: 0.7210879325866699
05/07/2023 05:07:07 INFO avg loss: 0.14899523392779884
05/07/2023 05:07:40 INFO Quantizing mlp.dense_h_to_4h in layer 5/16...
05/07/2023 05:07:41 INFO duration: 1.0800914764404297
05/07/2023 05:07:41 INFO avg loss: 2.332041130025293
05/07/2023 05:11:27 INFO Quantizing mlp.dense_4h_to_h in layer 5/16...
05/07/2023 05:11:33 INFO duration: 6.191901206970215
05/07/2023 05:11:33 INFO avg loss: 0.3255492384060503
05/07/2023 05:11:54 INFO Start quantizing layer 6/16
05/07/2023 05:12:27 INFO Quantizing attention.query_key_value in layer 6/16...
05/07/2023 05:12:28 INFO duration: 0.9662725925445557
05/07/2023 05:12:28 INFO avg loss: 1.757845780085197
05/07/2023 05:13:01 INFO Quantizing attention.dense in layer 6/16...
05/07/2023 05:13:02 INFO duration: 0.7185342311859131
05/07/2023 05:13:02 INFO avg loss: 0.15947506450616514
05/07/2023 05:13:35 INFO Quantizing mlp.dense_h_to_4h in layer 6/16...
05/07/2023 05:13:36 INFO duration: 1.075429916381836
05/07/2023 05:13:36 INFO avg loss: 2.4491654498635516
05/07/2023 05:17:18 INFO Quantizing mlp.dense_4h_to_h in layer 6/16...
05/07/2023 05:17:24 INFO duration: 5.919256925582886
05/07/2023 05:17:24 INFO avg loss: 0.40534172017480363
05/07/2023 05:17:45 INFO Start quantizing layer 7/16
05/07/2023 05:18:18 INFO Quantizing attention.query_key_value in layer 7/16...
05/07/2023 05:18:19 INFO duration: 0.9676733016967773
05/07/2023 05:18:19 INFO avg loss: 2.131913417698349
05/07/2023 05:18:52 INFO Quantizing attention.dense in layer 7/16...
05/07/2023 05:18:53 INFO duration: 0.7196581363677979
05/07/2023 05:18:53 INFO avg loss: 0.20212076367915502
05/07/2023 05:19:26 INFO Quantizing mlp.dense_h_to_4h in layer 7/16...
05/07/2023 05:19:27 INFO duration: 1.0817346572875977
05/07/2023 05:19:27 INFO avg loss: 2.4321377462726304
05/07/2023 05:23:08 INFO Quantizing mlp.dense_4h_to_h in layer 7/16...
05/07/2023 05:23:14 INFO duration: 5.973307132720947
05/07/2023 05:23:14 INFO avg loss: 0.4796293378511049
05/07/2023 05:23:35 INFO Start quantizing layer 8/16
05/07/2023 05:24:08 INFO Quantizing attention.query_key_value in layer 8/16...
05/07/2023 05:24:09 INFO duration: 0.9668700695037842
05/07/2023 05:24:09 INFO avg loss: 2.3333008332501333
05/07/2023 05:24:42 INFO Quantizing attention.dense in layer 8/16...
05/07/2023 05:24:43 INFO duration: 0.7205338478088379
05/07/2023 05:24:43 INFO avg loss: 0.2906766491322218
05/07/2023 05:25:16 INFO Quantizing mlp.dense_h_to_4h in layer 8/16...
05/07/2023 05:25:17 INFO duration: 1.075392246246338
05/07/2023 05:25:17 INFO avg loss: 2.088160245690229
05/07/2023 05:28:59 INFO Quantizing mlp.dense_4h_to_h in layer 8/16...
05/07/2023 05:29:05 INFO duration: 6.0966198444366455
05/07/2023 05:29:05 INFO avg loss: 0.4126856014751398
05/07/2023 05:29:26 INFO Start quantizing layer 9/16
05/07/2023 05:29:59 INFO Quantizing attention.query_key_value in layer 9/16...
05/07/2023 05:30:00 INFO duration: 0.971062183380127
05/07/2023 05:30:00 INFO avg loss: 4.631909777689031
05/07/2023 05:30:33 INFO Quantizing attention.dense in layer 9/16...
05/07/2023 05:30:34 INFO duration: 0.7198226451873779
05/07/2023 05:30:34 INFO avg loss: 0.2723473172091321
05/07/2023 05:31:07 INFO Quantizing mlp.dense_h_to_4h in layer 9/16...
05/07/2023 05:31:08 INFO duration: 1.0791394710540771
05/07/2023 05:31:08 INFO avg loss: 2.0461749482078675
05/07/2023 05:34:49 INFO Quantizing mlp.dense_4h_to_h in layer 9/16...
05/07/2023 05:34:55 INFO duration: 5.983144044876099
05/07/2023 05:34:55 INFO avg loss: 0.5113805541342186
05/07/2023 05:35:16 INFO Start quantizing layer 10/16
05/07/2023 05:35:49 INFO Quantizing attention.query_key_value in layer 10/16...
05/07/2023 05:35:50 INFO duration: 0.9664998054504395
05/07/2023 05:35:50 INFO avg loss: 7.197037864416933
05/07/2023 05:36:23 INFO Quantizing attention.dense in layer 10/16...
05/07/2023 05:36:24 INFO duration: 0.7181813716888428
05/07/2023 05:36:24 INFO avg loss: 0.3427228673705405
05/07/2023 05:36:57 INFO Quantizing mlp.dense_h_to_4h in layer 10/16...
05/07/2023 05:36:58 INFO duration: 1.0781819820404053
05/07/2023 05:36:58 INFO avg loss: 2.320328880041933
05/07/2023 05:40:40 INFO Quantizing mlp.dense_4h_to_h in layer 10/16...
05/07/2023 05:40:46 INFO duration: 6.027331829071045
05/07/2023 05:40:46 INFO avg loss: 0.6135274056301584
05/07/2023 05:41:07 INFO Start quantizing layer 11/16
05/07/2023 05:41:40 INFO Quantizing attention.query_key_value in layer 11/16...
05/07/2023 05:41:41 INFO duration: 0.9669804573059082
05/07/2023 05:41:41 INFO avg loss: 7.502283845846645
05/07/2023 05:42:14 INFO Quantizing attention.dense in layer 11/16...
05/07/2023 05:42:14 INFO duration: 0.7167062759399414
05/07/2023 05:42:14 INFO avg loss: 0.2933824760591387
05/07/2023 05:42:47 INFO Quantizing mlp.dense_h_to_4h in layer 11/16...
05/07/2023 05:42:48 INFO duration: 1.077958345413208
05/07/2023 05:42:48 INFO avg loss: 2.6354988268769968
05/07/2023 05:46:30 INFO Quantizing mlp.dense_4h_to_h in layer 11/16...
05/07/2023 05:46:36 INFO duration: 5.968295335769653
05/07/2023 05:46:36 INFO avg loss: 0.7737983809238551
05/07/2023 05:46:57 INFO Start quantizing layer 12/16
05/07/2023 05:47:30 INFO Quantizing attention.query_key_value in layer 12/16...
05/07/2023 05:47:31 INFO duration: 0.9708924293518066
05/07/2023 05:47:31 INFO avg loss: 6.875169520433972
05/07/2023 05:48:04 INFO Quantizing attention.dense in layer 12/16...
05/07/2023 05:48:05 INFO duration: 0.7233545780181885
05/07/2023 05:48:05 INFO avg loss: 0.36776245897189497
05/07/2023 05:48:38 INFO Quantizing mlp.dense_h_to_4h in layer 12/16...
05/07/2023 05:48:39 INFO duration: 1.078718900680542
05/07/2023 05:48:39 INFO avg loss: 2.9615547415801386
05/07/2023 05:52:21 INFO Quantizing mlp.dense_4h_to_h in layer 12/16...
05/07/2023 05:52:27 INFO duration: 6.078177452087402
05/07/2023 05:52:27 INFO avg loss: 0.9158687896241015
05/07/2023 05:52:48 INFO Start quantizing layer 13/16
05/07/2023 05:53:21 INFO Quantizing attention.query_key_value in layer 13/16...
05/07/2023 05:53:22 INFO duration: 0.9698812961578369
05/07/2023 05:53:22 INFO avg loss: 5.93688639842918
05/07/2023 05:53:54 INFO Quantizing attention.dense in layer 13/16...
05/07/2023 05:53:55 INFO duration: 0.7205860614776611
05/07/2023 05:53:55 INFO avg loss: 0.24467934637912672
05/07/2023 05:54:28 INFO Quantizing mlp.dense_h_to_4h in layer 13/16...
05/07/2023 05:54:29 INFO duration: 1.0801022052764893
05/07/2023 05:54:29 INFO avg loss: 3.275802466054313
05/07/2023 05:58:11 INFO Quantizing mlp.dense_4h_to_h in layer 13/16...
05/07/2023 05:58:17 INFO duration: 6.09338641166687
05/07/2023 05:58:17 INFO avg loss: 1.0767965265991082
05/07/2023 05:58:38 INFO Start quantizing layer 14/16
05/07/2023 05:59:11 INFO Quantizing attention.query_key_value in layer 14/16...
05/07/2023 05:59:12 INFO duration: 0.9676227569580078
05/07/2023 05:59:12 INFO avg loss: 6.686944638578275
05/07/2023 05:59:45 INFO Quantizing attention.dense in layer 14/16...
05/07/2023 05:59:46 INFO duration: 0.7196416854858398
05/07/2023 05:59:46 INFO avg loss: 0.34242789661541534
05/07/2023 06:00:19 INFO Quantizing mlp.dense_h_to_4h in layer 14/16...
05/07/2023 06:00:20 INFO duration: 1.0829389095306396
05/07/2023 06:00:20 INFO avg loss: 3.705307965588392
05/07/2023 06:04:02 INFO Quantizing mlp.dense_4h_to_h in layer 14/16...
05/07/2023 06:04:08 INFO duration: 6.013010263442993
05/07/2023 06:04:08 INFO avg loss: 1.1975950458433173
05/07/2023 06:04:29 INFO Start quantizing layer 15/16
05/07/2023 06:05:02 INFO Quantizing attention.query_key_value in layer 15/16...
05/07/2023 06:05:03 INFO duration: 0.9704198837280273
05/07/2023 06:05:03 INFO avg loss: 7.567932973908413
05/07/2023 06:05:36 INFO Quantizing attention.dense in layer 15/16...
05/07/2023 06:05:37 INFO duration: 0.7222294807434082
05/07/2023 06:05:37 INFO avg loss: 0.4468821890184039
05/07/2023 06:06:10 INFO Quantizing mlp.dense_h_to_4h in layer 15/16...
05/07/2023 06:06:11 INFO duration: 1.0775363445281982
05/07/2023 06:06:11 INFO avg loss: 4.276716368393903
05/07/2023 06:09:52 INFO Quantizing mlp.dense_4h_to_h in layer 15/16...
05/07/2023 06:09:58 INFO duration: 6.097189664840698
05/07/2023 06:09:58 INFO avg loss: 1.6799194205937167
05/07/2023 06:10:19 INFO Start quantizing layer 16/16
05/07/2023 06:10:52 INFO Quantizing attention.query_key_value in layer 16/16...
05/07/2023 06:10:53 INFO duration: 0.9705617427825928
05/07/2023 06:10:53 INFO avg loss: 7.100380016972843
05/07/2023 06:11:26 INFO Quantizing attention.dense in layer 16/16...
05/07/2023 06:11:27 INFO duration: 0.722510814666748
05/07/2023 06:11:27 INFO avg loss: 0.24434113426330373
05/07/2023 06:12:00 INFO Quantizing mlp.dense_h_to_4h in layer 16/16...
05/07/2023 06:12:01 INFO duration: 1.0826246738433838
05/07/2023 06:12:01 INFO avg loss: 4.788446298422524
05/07/2023 06:15:43 INFO Quantizing mlp.dense_4h_to_h in layer 16/16...
05/07/2023 06:15:49 INFO duration: 6.170569658279419
05/07/2023 06:15:49 INFO avg loss: 1.7897084716536875
05/07/2023 06:16:11 INFO Packing model...
05/07/2023 06:16:11 INFO gpt_neox.layers.0.attention.dense
05/07/2023 06:16:12 INFO gpt_neox.layers.0.attention.query_key_value
05/07/2023 06:16:15 INFO gpt_neox.layers.0.mlp.dense_4h_to_h
05/07/2023 06:16:18 INFO gpt_neox.layers.0.mlp.dense_h_to_4h
05/07/2023 06:16:22 INFO gpt_neox.layers.1.attention.dense
05/07/2023 06:16:23 INFO gpt_neox.layers.1.attention.query_key_value
05/07/2023 06:16:26 INFO gpt_neox.layers.1.mlp.dense_4h_to_h
05/07/2023 06:16:29 INFO gpt_neox.layers.1.mlp.dense_h_to_4h
05/07/2023 06:16:33 INFO gpt_neox.layers.2.attention.dense
05/07/2023 06:16:34 INFO gpt_neox.layers.2.attention.query_key_value
05/07/2023 06:16:37 INFO gpt_neox.layers.2.mlp.dense_4h_to_h
05/07/2023 06:16:40 INFO gpt_neox.layers.2.mlp.dense_h_to_4h
05/07/2023 06:16:44 INFO gpt_neox.layers.3.attention.dense
05/07/2023 06:16:45 INFO gpt_neox.layers.3.attention.query_key_value
05/07/2023 06:16:48 INFO gpt_neox.layers.3.mlp.dense_4h_to_h
05/07/2023 06:16:51 INFO gpt_neox.layers.3.mlp.dense_h_to_4h
05/07/2023 06:16:56 INFO gpt_neox.layers.4.attention.dense
05/07/2023 06:16:56 INFO gpt_neox.layers.4.attention.query_key_value
05/07/2023 06:16:59 INFO gpt_neox.layers.4.mlp.dense_4h_to_h
05/07/2023 06:17:03 INFO gpt_neox.layers.4.mlp.dense_h_to_4h
05/07/2023 06:17:07 INFO gpt_neox.layers.5.attention.dense
05/07/2023 06:17:08 INFO gpt_neox.layers.5.attention.query_key_value
05/07/2023 06:17:11 INFO gpt_neox.layers.5.mlp.dense_4h_to_h
05/07/2023 06:17:14 INFO gpt_neox.layers.5.mlp.dense_h_to_4h
05/07/2023 06:17:18 INFO gpt_neox.layers.6.attention.dense
05/07/2023 06:17:19 INFO gpt_neox.layers.6.attention.query_key_value
05/07/2023 06:17:22 INFO gpt_neox.layers.6.mlp.dense_4h_to_h
05/07/2023 06:17:25 INFO gpt_neox.layers.6.mlp.dense_h_to_4h
05/07/2023 06:17:29 INFO gpt_neox.layers.7.attention.dense
05/07/2023 06:17:30 INFO gpt_neox.layers.7.attention.query_key_value
05/07/2023 06:17:33 INFO gpt_neox.layers.7.mlp.dense_4h_to_h
05/07/2023 06:17:36 INFO gpt_neox.layers.7.mlp.dense_h_to_4h
05/07/2023 06:17:40 INFO gpt_neox.layers.8.attention.dense
05/07/2023 06:17:41 INFO gpt_neox.layers.8.attention.query_key_value
05/07/2023 06:17:44 INFO gpt_neox.layers.8.mlp.dense_4h_to_h
05/07/2023 06:17:47 INFO gpt_neox.layers.8.mlp.dense_h_to_4h
05/07/2023 06:17:51 INFO gpt_neox.layers.9.attention.dense
05/07/2023 06:17:52 INFO gpt_neox.layers.9.attention.query_key_value
05/07/2023 06:17:55 INFO gpt_neox.layers.9.mlp.dense_4h_to_h
05/07/2023 06:17:58 INFO gpt_neox.layers.9.mlp.dense_h_to_4h
05/07/2023 06:18:02 INFO gpt_neox.layers.10.attention.dense
05/07/2023 06:18:03 INFO gpt_neox.layers.10.attention.query_key_value
05/07/2023 06:18:06 INFO gpt_neox.layers.10.mlp.dense_4h_to_h
05/07/2023 06:18:09 INFO gpt_neox.layers.10.mlp.dense_h_to_4h
05/07/2023 06:18:13 INFO gpt_neox.layers.11.attention.dense
05/07/2023 06:18:14 INFO gpt_neox.layers.11.attention.query_key_value
05/07/2023 06:18:17 INFO gpt_neox.layers.11.mlp.dense_4h_to_h
05/07/2023 06:18:20 INFO gpt_neox.layers.11.mlp.dense_h_to_4h
05/07/2023 06:18:24 INFO gpt_neox.layers.12.attention.dense
05/07/2023 06:18:25 INFO gpt_neox.layers.12.attention.query_key_value
05/07/2023 06:18:28 INFO gpt_neox.layers.12.mlp.dense_4h_to_h
05/07/2023 06:18:31 INFO gpt_neox.layers.12.mlp.dense_h_to_4h
05/07/2023 06:18:35 INFO gpt_neox.layers.13.attention.dense
05/07/2023 06:18:36 INFO gpt_neox.layers.13.attention.query_key_value
05/07/2023 06:18:39 INFO gpt_neox.layers.13.mlp.dense_4h_to_h
05/07/2023 06:18:42 INFO gpt_neox.layers.13.mlp.dense_h_to_4h
05/07/2023 06:18:46 INFO gpt_neox.layers.14.attention.dense
05/07/2023 06:18:47 INFO gpt_neox.layers.14.attention.query_key_value
05/07/2023 06:18:50 INFO gpt_neox.layers.14.mlp.dense_4h_to_h
05/07/2023 06:18:53 INFO gpt_neox.layers.14.mlp.dense_h_to_4h
05/07/2023 06:18:57 INFO gpt_neox.layers.15.attention.dense
05/07/2023 06:18:58 INFO gpt_neox.layers.15.attention.query_key_value
05/07/2023 06:19:01 INFO gpt_neox.layers.15.mlp.dense_4h_to_h
05/07/2023 06:19:04 INFO gpt_neox.layers.15.mlp.dense_h_to_4h
05/07/2023 06:19:08 INFO Model packed.
05/07/2023 06:19:08 WARNING using autotune_warmup will move model to GPU, make sure you have enough VRAM to load the whole model.
05/07/2023 06:19:09 INFO Found 4 unique KN Linear values.
05/07/2023 06:19:09 INFO Warming up autotune cache ...
05/07/2023 06:19:58 INFO Done! Saving..
05/07/2023 06:20:05 INFO Saved. Size of the model file(s): 10063.64 MB
05/07/2023 06:20:05 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:20:05 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:20:06 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:20:06 INFO Found 4 unique KN Linear values.
05/07/2023 06:20:06 INFO Warming up autotune cache ...
05/07/2023 06:20:07 INFO Sample output: ('Because woodchucks (or squirrels, as they\'re also known) are "the chink[e] '
'of wood."')
05/07/2023 06:20:07 INFO GPU memory usage during test inference: 4.61 GB
05/07/2023 06:20:08 WARNING No such comm: d349e6339e5442e4a3286af931f0699f
05/07/2023 06:20:08 WARNING No such comm: 9374387013794a8bab6ba19cace86d58
05/07/2023 06:20:08 WARNING No such comm: bf152b67bcc04b93863ac311ea4df76a
05/07/2023 06:20:08 WARNING No such comm: 118ccfc8fe874373ae03f8132fb8c258
05/07/2023 06:20:08 WARNING No such comm: 9d85d31e378c44ce9119ead8b83e7556
05/07/2023 06:20:08 WARNING No such comm: c8c5130cae894895a66be12fe834c673
05/07/2023 06:20:08 WARNING No such comm: 237ea212dbd74befad2f34ba2161307d
05/07/2023 06:20:08 WARNING No such comm: 86eda75ae855461b8f5c1ae5b3a83cec
05/07/2023 06:20:08 WARNING No such comm: 63731c4e51f0433fbf85712c08c3d4bf
05/07/2023 06:20:08 WARNING No such comm: 2079a099466341488fc017f30e9359a8
05/07/2023 06:20:08 WARNING No such comm: 99fa75439d3d47c0a6a5b7c25c526718
05/07/2023 06:20:09 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:20:09 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:20:09 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:20:10 INFO Found 4 unique KN Linear values.
05/07/2023 06:20:10 INFO Warming up autotune cache ...
05/07/2023 06:31:04 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:31:04 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:31:04 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:31:05 INFO Found 4 unique KN Linear values.
05/07/2023 06:31:05 INFO Warming up autotune cache ...
05/07/2023 06:31:46 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:31:46 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:31:46 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:31:46 INFO Found 4 unique KN Linear values.
05/07/2023 06:31:46 INFO Warming up autotune cache ...
05/07/2023 06:32:16 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:32:16 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:32:16 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:32:16 INFO Found 4 unique KN Linear values.
05/07/2023 06:32:16 INFO Warming up autotune cache ...
05/07/2023 06:32:42 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:32:42 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:32:42 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:32:42 INFO Found 4 unique KN Linear values.
05/07/2023 06:32:42 INFO Warming up autotune cache ...