File size: 22,958 Bytes
1e2d4fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
    05/07/2023 04:42:05 WARNING Found cached dataset parquet (/home/pszemraj/.cache/huggingface/datasets/OpenAssistant___parquet/OpenAssistant--oasst1-2960c57d7e52ab15/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
05/07/2023 04:42:06 WARNING No such comm: c8c073cce7994da5b454ed0300090049
05/07/2023 04:42:06 WARNING No such comm: 1103c6a0950249ca863ebc8399fddfef
05/07/2023 04:42:06 WARNING No such comm: 5c3ce017525f4406904695297ace8724
05/07/2023 04:42:06 WARNING No such comm: c5ceaf44ed3942cdb730705e230f024b
05/07/2023 04:42:06 WARNING No such comm: f953c7265b2248c98cc4dbe971b44f3d
05/07/2023 04:42:06 WARNING No such comm: 687a131767524803a41093a1d84f4652
05/07/2023 04:42:06 WARNING No such comm: 93293aa5cce946bc8c6aa6ee4d0eaeb1
05/07/2023 04:42:06 WARNING No such comm: 637d46ef1d57406a817ef020d0c7bf06
05/07/2023 04:42:06 WARNING No such comm: 494913a72a3b4802b2390b58f38a3a36
05/07/2023 04:42:06 WARNING No such comm: 2678191b17564118a9e16b1201d9b4d2
05/07/2023 04:42:06 WARNING No such comm: 891bcbcf176840789f36c723e386c9b9
05/07/2023 04:42:06 INFO Quantized model will be saved to: /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g
05/07/2023 04:42:14 INFO Running quantization..
05/07/2023 04:42:16 INFO Start quantizing layer 1/16
05/07/2023 04:42:49 INFO Quantizing attention.query_key_value in layer 1/16...
05/07/2023 04:42:50 INFO duration: 1.0365328788757324
05/07/2023 04:42:50 INFO avg loss: 0.2228083991395018
05/07/2023 04:43:23 INFO Quantizing attention.dense in layer 1/16...
05/07/2023 04:43:24 INFO duration: 0.7084124088287354
05/07/2023 04:43:24 INFO avg loss: 0.01904001936744958
05/07/2023 04:43:57 INFO Quantizing mlp.dense_h_to_4h in layer 1/16...
05/07/2023 04:43:58 INFO duration: 1.0652313232421875
05/07/2023 04:43:58 INFO avg loss: 0.304011920770505/07/2023 04:47:44 INFO Quantizing mlp.dense_4h_to_h in layer 1/16...
05/07/2023 04:47:51 INFO duration: 6.762867212295532
05/07/2023 04:47:51 INFO avg loss: 0.028748639221516405
05/07/2023 04:48:12 INFO Start quantizing layer 2/16
05/07/2023 04:48:45 INFO Quantizing attention.query_key_value in layer 2/16...
05/07/2023 04:48:46 INFO duration: 0.9713742733001709
05/07/2023 04:48:46 INFO avg loss: 0.35355199259310105
05/07/2023 04:49:19 INFO Quantizing attention.dense in layer 2/16...
05/07/2023 04:49:20 INFO duration: 0.7275807857513428
05/07/2023 04:49:20 INFO avg loss: 0.06647738861961487
05/07/2023 04:49:53 INFO Quantizing mlp.dense_h_to_4h in layer 2/16...
05/07/2023 04:49:54 INFO duration: 1.083951711654663
05/07/2023 04:49:54 INFO avg loss: 0.6772610437882721
05/07/2023 04:53:40 INFO Quantizing mlp.dense_4h_to_h in layer 2/16...
05/07/2023 04:53:47 INFO duration: 6.844736814498901
05/07/2023 04:53:47 INFO avg loss: 0.05320497620473908
05/07/2023 04:54:08 INFO Start quantizing layer 3/16
05/07/2023 04:54:41 INFO Quantizing attention.query_key_value in layer 3/16...
05/07/2023 04:54:42 INFO duration: 0.9685044288635254
05/07/2023 04:54:42 INFO avg loss: 0.6015139448756989
05/07/2023 04:55:15 INFO Quantizing attention.dense in layer 3/16...
05/07/2023 04:55:16 INFO duration: 0.7167198657989502
05/07/2023 04:55:16 INFO avg loss: 0.06039099241344058
05/07/2023 04:55:49 INFO Quantizing mlp.dense_h_to_4h in layer 3/16...
05/07/2023 04:55:50 INFO duration: 1.0765190124511719
05/07/2023 04:55:50 INFO avg loss: 1.3903707193490416
05/07/2023 04:59:37 INFO Quantizing mlp.dense_4h_to_h in layer 3/16...
05/07/2023 04:59:43 INFO duration: 6.270395040512085
05/07/2023 04:59:43 INFO avg loss: 0.181059166011465
05/07/2023 05:00:04 INFO Start quantizing layer 4/16
05/07/2023 05:00:37 INFO Quantizing attention.query_key_value in layer 4/16...
05/07/2023 05:00:38 INFO duration: 0.9672496318817139
05/07/2023 05:00:38 INFO avg loss: 0.9807066506090255
05/07/2023 05:01:11 INFO Quantizing attention.dense in layer 4/16...
05/07/2023 05:01:12 INFO duration: 0.7248861789703369
05/07/2023 05:01:12 INFO avg loss: 0.1315788618418863
05/07/2023 05:01:45 INFO Quantizing mlp.dense_h_to_4h in layer 4/16...
05/07/2023 05:01:46 INFO duration: 1.083066463470459
05/07/2023 05:01:46 INFO avg loss: 2.080002984807641
05/07/2023 05:05:32 INFO Quantizing mlp.dense_4h_to_h in layer 4/16...
05/07/2023 05:05:38 INFO duration: 6.18793797492981
05/07/2023 05:05:38 INFO avg loss: 0.252437506240016
05/07/2023 05:05:59 INFO Start quantizing layer 5/16
05/07/2023 05:06:32 INFO Quantizing attention.query_key_value in layer 5/16...
05/07/2023 05:06:33 INFO duration: 0.9693779945373535
05/07/2023 05:06:33 INFO avg loss: 1.3782398682940629
05/07/2023 05:07:06 INFO Quantizing attention.dense in layer 5/16...
05/07/2023 05:07:07 INFO duration: 0.7210879325866699
05/07/2023 05:07:07 INFO avg loss: 0.14899523392779884
05/07/2023 05:07:40 INFO Quantizing mlp.dense_h_to_4h in layer 5/16...
05/07/2023 05:07:41 INFO duration: 1.0800914764404297
05/07/2023 05:07:41 INFO avg loss: 2.332041130025293
05/07/2023 05:11:27 INFO Quantizing mlp.dense_4h_to_h in layer 5/16...
05/07/2023 05:11:33 INFO duration: 6.191901206970215
05/07/2023 05:11:33 INFO avg loss: 0.3255492384060503
05/07/2023 05:11:54 INFO Start quantizing layer 6/16
05/07/2023 05:12:27 INFO Quantizing attention.query_key_value in layer 6/16...
05/07/2023 05:12:28 INFO duration: 0.9662725925445557
05/07/2023 05:12:28 INFO avg loss: 1.757845780085197
05/07/2023 05:13:01 INFO Quantizing attention.dense in layer 6/16...
05/07/2023 05:13:02 INFO duration: 0.7185342311859131
05/07/2023 05:13:02 INFO avg loss: 0.15947506450616514
05/07/2023 05:13:35 INFO Quantizing mlp.dense_h_to_4h in layer 6/16...
05/07/2023 05:13:36 INFO duration: 1.075429916381836
05/07/2023 05:13:36 INFO avg loss: 2.4491654498635516
05/07/2023 05:17:18 INFO Quantizing mlp.dense_4h_to_h in layer 6/16...
05/07/2023 05:17:24 INFO duration: 5.919256925582886
05/07/2023 05:17:24 INFO avg loss: 0.40534172017480363
05/07/2023 05:17:45 INFO Start quantizing layer 7/16
05/07/2023 05:18:18 INFO Quantizing attention.query_key_value in layer 7/16...
05/07/2023 05:18:19 INFO duration: 0.9676733016967773
05/07/2023 05:18:19 INFO avg loss: 2.131913417698349
05/07/2023 05:18:52 INFO Quantizing attention.dense in layer 7/16...
05/07/2023 05:18:53 INFO duration: 0.7196581363677979
05/07/2023 05:18:53 INFO avg loss: 0.20212076367915502
05/07/2023 05:19:26 INFO Quantizing mlp.dense_h_to_4h in layer 7/16...
05/07/2023 05:19:27 INFO duration: 1.0817346572875977
05/07/2023 05:19:27 INFO avg loss: 2.4321377462726304
05/07/2023 05:23:08 INFO Quantizing mlp.dense_4h_to_h in layer 7/16...
05/07/2023 05:23:14 INFO duration: 5.973307132720947
05/07/2023 05:23:14 INFO avg loss: 0.4796293378511049
05/07/2023 05:23:35 INFO Start quantizing layer 8/16
05/07/2023 05:24:08 INFO Quantizing attention.query_key_value in layer 8/16...
05/07/2023 05:24:09 INFO duration: 0.9668700695037842
05/07/2023 05:24:09 INFO avg loss: 2.3333008332501333
05/07/2023 05:24:42 INFO Quantizing attention.dense in layer 8/16...
05/07/2023 05:24:43 INFO duration: 0.7205338478088379
05/07/2023 05:24:43 INFO avg loss: 0.2906766491322218
05/07/2023 05:25:16 INFO Quantizing mlp.dense_h_to_4h in layer 8/16...
05/07/2023 05:25:17 INFO duration: 1.075392246246338
05/07/2023 05:25:17 INFO avg loss: 2.088160245690229
05/07/2023 05:28:59 INFO Quantizing mlp.dense_4h_to_h in layer 8/16...
05/07/2023 05:29:05 INFO duration: 6.0966198444366455
05/07/2023 05:29:05 INFO avg loss: 0.4126856014751398
05/07/2023 05:29:26 INFO Start quantizing layer 9/16
05/07/2023 05:29:59 INFO Quantizing attention.query_key_value in layer 9/16...
05/07/2023 05:30:00 INFO duration: 0.971062183380127
05/07/2023 05:30:00 INFO avg loss: 4.631909777689031
05/07/2023 05:30:33 INFO Quantizing attention.dense in layer 9/16...
05/07/2023 05:30:34 INFO duration: 0.7198226451873779
05/07/2023 05:30:34 INFO avg loss: 0.2723473172091321
05/07/2023 05:31:07 INFO Quantizing mlp.dense_h_to_4h in layer 9/16...
05/07/2023 05:31:08 INFO duration: 1.0791394710540771
05/07/2023 05:31:08 INFO avg loss: 2.0461749482078675
05/07/2023 05:34:49 INFO Quantizing mlp.dense_4h_to_h in layer 9/16...
05/07/2023 05:34:55 INFO duration: 5.983144044876099
05/07/2023 05:34:55 INFO avg loss: 0.5113805541342186
05/07/2023 05:35:16 INFO Start quantizing layer 10/16
05/07/2023 05:35:49 INFO Quantizing attention.query_key_value in layer 10/16...
05/07/2023 05:35:50 INFO duration: 0.9664998054504395
05/07/2023 05:35:50 INFO avg loss: 7.197037864416933
05/07/2023 05:36:23 INFO Quantizing attention.dense in layer 10/16...
05/07/2023 05:36:24 INFO duration: 0.7181813716888428
05/07/2023 05:36:24 INFO avg loss: 0.3427228673705405
05/07/2023 05:36:57 INFO Quantizing mlp.dense_h_to_4h in layer 10/16...
05/07/2023 05:36:58 INFO duration: 1.0781819820404053
05/07/2023 05:36:58 INFO avg loss: 2.320328880041933
05/07/2023 05:40:40 INFO Quantizing mlp.dense_4h_to_h in layer 10/16...
05/07/2023 05:40:46 INFO duration: 6.027331829071045
05/07/2023 05:40:46 INFO avg loss: 0.6135274056301584
05/07/2023 05:41:07 INFO Start quantizing layer 11/16
05/07/2023 05:41:40 INFO Quantizing attention.query_key_value in layer 11/16...
05/07/2023 05:41:41 INFO duration: 0.9669804573059082
05/07/2023 05:41:41 INFO avg loss: 7.502283845846645
05/07/2023 05:42:14 INFO Quantizing attention.dense in layer 11/16...
05/07/2023 05:42:14 INFO duration: 0.7167062759399414
05/07/2023 05:42:14 INFO avg loss: 0.2933824760591387
05/07/2023 05:42:47 INFO Quantizing mlp.dense_h_to_4h in layer 11/16...
05/07/2023 05:42:48 INFO duration: 1.077958345413208
05/07/2023 05:42:48 INFO avg loss: 2.6354988268769968
05/07/2023 05:46:30 INFO Quantizing mlp.dense_4h_to_h in layer 11/16...
05/07/2023 05:46:36 INFO duration: 5.968295335769653
05/07/2023 05:46:36 INFO avg loss: 0.7737983809238551
05/07/2023 05:46:57 INFO Start quantizing layer 12/16
05/07/2023 05:47:30 INFO Quantizing attention.query_key_value in layer 12/16...
05/07/2023 05:47:31 INFO duration: 0.9708924293518066
05/07/2023 05:47:31 INFO avg loss: 6.875169520433972
05/07/2023 05:48:04 INFO Quantizing attention.dense in layer 12/16...
05/07/2023 05:48:05 INFO duration: 0.7233545780181885
05/07/2023 05:48:05 INFO avg loss: 0.36776245897189497
05/07/2023 05:48:38 INFO Quantizing mlp.dense_h_to_4h in layer 12/16...
05/07/2023 05:48:39 INFO duration: 1.078718900680542
05/07/2023 05:48:39 INFO avg loss: 2.9615547415801386
05/07/2023 05:52:21 INFO Quantizing mlp.dense_4h_to_h in layer 12/16...
05/07/2023 05:52:27 INFO duration: 6.078177452087402
05/07/2023 05:52:27 INFO avg loss: 0.9158687896241015
05/07/2023 05:52:48 INFO Start quantizing layer 13/16
05/07/2023 05:53:21 INFO Quantizing attention.query_key_value in layer 13/16...
05/07/2023 05:53:22 INFO duration: 0.9698812961578369
05/07/2023 05:53:22 INFO avg loss: 5.93688639842918
05/07/2023 05:53:54 INFO Quantizing attention.dense in layer 13/16...
05/07/2023 05:53:55 INFO duration: 0.7205860614776611
05/07/2023 05:53:55 INFO avg loss: 0.24467934637912672
05/07/2023 05:54:28 INFO Quantizing mlp.dense_h_to_4h in layer 13/16...
05/07/2023 05:54:29 INFO duration: 1.0801022052764893
05/07/2023 05:54:29 INFO avg loss: 3.275802466054313
05/07/2023 05:58:11 INFO Quantizing mlp.dense_4h_to_h in layer 13/16...
05/07/2023 05:58:17 INFO duration: 6.09338641166687
05/07/2023 05:58:17 INFO avg loss: 1.0767965265991082
05/07/2023 05:58:38 INFO Start quantizing layer 14/16
05/07/2023 05:59:11 INFO Quantizing attention.query_key_value in layer 14/16...
05/07/2023 05:59:12 INFO duration: 0.9676227569580078
05/07/2023 05:59:12 INFO avg loss: 6.686944638578275
05/07/2023 05:59:45 INFO Quantizing attention.dense in layer 14/16...
05/07/2023 05:59:46 INFO duration: 0.7196416854858398
05/07/2023 05:59:46 INFO avg loss: 0.34242789661541534
05/07/2023 06:00:19 INFO Quantizing mlp.dense_h_to_4h in layer 14/16...
05/07/2023 06:00:20 INFO duration: 1.0829389095306396
05/07/2023 06:00:20 INFO avg loss: 3.705307965588392
05/07/2023 06:04:02 INFO Quantizing mlp.dense_4h_to_h in layer 14/16...
05/07/2023 06:04:08 INFO duration: 6.013010263442993
05/07/2023 06:04:08 INFO avg loss: 1.1975950458433173
05/07/2023 06:04:29 INFO Start quantizing layer 15/16
05/07/2023 06:05:02 INFO Quantizing attention.query_key_value in layer 15/16...
05/07/2023 06:05:03 INFO duration: 0.9704198837280273
05/07/2023 06:05:03 INFO avg loss: 7.567932973908413
05/07/2023 06:05:36 INFO Quantizing attention.dense in layer 15/16...
05/07/2023 06:05:37 INFO duration: 0.7222294807434082
05/07/2023 06:05:37 INFO avg loss: 0.4468821890184039
05/07/2023 06:06:10 INFO Quantizing mlp.dense_h_to_4h in layer 15/16...
05/07/2023 06:06:11 INFO duration: 1.0775363445281982
05/07/2023 06:06:11 INFO avg loss: 4.276716368393903
05/07/2023 06:09:52 INFO Quantizing mlp.dense_4h_to_h in layer 15/16...
05/07/2023 06:09:58 INFO duration: 6.097189664840698
05/07/2023 06:09:58 INFO avg loss: 1.6799194205937167
05/07/2023 06:10:19 INFO Start quantizing layer 16/16
05/07/2023 06:10:52 INFO Quantizing attention.query_key_value in layer 16/16...
05/07/2023 06:10:53 INFO duration: 0.9705617427825928
05/07/2023 06:10:53 INFO avg loss: 7.100380016972843
05/07/2023 06:11:26 INFO Quantizing attention.dense in layer 16/16...
05/07/2023 06:11:27 INFO duration: 0.722510814666748
05/07/2023 06:11:27 INFO avg loss: 0.24434113426330373
05/07/2023 06:12:00 INFO Quantizing mlp.dense_h_to_4h in layer 16/16...
05/07/2023 06:12:01 INFO duration: 1.0826246738433838
05/07/2023 06:12:01 INFO avg loss: 4.788446298422524
05/07/2023 06:15:43 INFO Quantizing mlp.dense_4h_to_h in layer 16/16...
05/07/2023 06:15:49 INFO duration: 6.170569658279419
05/07/2023 06:15:49 INFO avg loss: 1.7897084716536875
05/07/2023 06:16:11 INFO Packing model...
05/07/2023 06:16:11 INFO gpt_neox.layers.0.attention.dense
05/07/2023 06:16:12 INFO gpt_neox.layers.0.attention.query_key_value
05/07/2023 06:16:15 INFO gpt_neox.layers.0.mlp.dense_4h_to_h
05/07/2023 06:16:18 INFO gpt_neox.layers.0.mlp.dense_h_to_4h
05/07/2023 06:16:22 INFO gpt_neox.layers.1.attention.dense
05/07/2023 06:16:23 INFO gpt_neox.layers.1.attention.query_key_value
05/07/2023 06:16:26 INFO gpt_neox.layers.1.mlp.dense_4h_to_h
05/07/2023 06:16:29 INFO gpt_neox.layers.1.mlp.dense_h_to_4h
05/07/2023 06:16:33 INFO gpt_neox.layers.2.attention.dense
05/07/2023 06:16:34 INFO gpt_neox.layers.2.attention.query_key_value
05/07/2023 06:16:37 INFO gpt_neox.layers.2.mlp.dense_4h_to_h
05/07/2023 06:16:40 INFO gpt_neox.layers.2.mlp.dense_h_to_4h
05/07/2023 06:16:44 INFO gpt_neox.layers.3.attention.dense
05/07/2023 06:16:45 INFO gpt_neox.layers.3.attention.query_key_value
05/07/2023 06:16:48 INFO gpt_neox.layers.3.mlp.dense_4h_to_h
05/07/2023 06:16:51 INFO gpt_neox.layers.3.mlp.dense_h_to_4h
05/07/2023 06:16:56 INFO gpt_neox.layers.4.attention.dense
05/07/2023 06:16:56 INFO gpt_neox.layers.4.attention.query_key_value
05/07/2023 06:16:59 INFO gpt_neox.layers.4.mlp.dense_4h_to_h
05/07/2023 06:17:03 INFO gpt_neox.layers.4.mlp.dense_h_to_4h
05/07/2023 06:17:07 INFO gpt_neox.layers.5.attention.dense
05/07/2023 06:17:08 INFO gpt_neox.layers.5.attention.query_key_value
05/07/2023 06:17:11 INFO gpt_neox.layers.5.mlp.dense_4h_to_h
05/07/2023 06:17:14 INFO gpt_neox.layers.5.mlp.dense_h_to_4h
05/07/2023 06:17:18 INFO gpt_neox.layers.6.attention.dense
05/07/2023 06:17:19 INFO gpt_neox.layers.6.attention.query_key_value
05/07/2023 06:17:22 INFO gpt_neox.layers.6.mlp.dense_4h_to_h
05/07/2023 06:17:25 INFO gpt_neox.layers.6.mlp.dense_h_to_4h
05/07/2023 06:17:29 INFO gpt_neox.layers.7.attention.dense
05/07/2023 06:17:30 INFO gpt_neox.layers.7.attention.query_key_value
05/07/2023 06:17:33 INFO gpt_neox.layers.7.mlp.dense_4h_to_h
05/07/2023 06:17:36 INFO gpt_neox.layers.7.mlp.dense_h_to_4h
05/07/2023 06:17:40 INFO gpt_neox.layers.8.attention.dense
05/07/2023 06:17:41 INFO gpt_neox.layers.8.attention.query_key_value
05/07/2023 06:17:44 INFO gpt_neox.layers.8.mlp.dense_4h_to_h
05/07/2023 06:17:47 INFO gpt_neox.layers.8.mlp.dense_h_to_4h
05/07/2023 06:17:51 INFO gpt_neox.layers.9.attention.dense
05/07/2023 06:17:52 INFO gpt_neox.layers.9.attention.query_key_value
05/07/2023 06:17:55 INFO gpt_neox.layers.9.mlp.dense_4h_to_h
05/07/2023 06:17:58 INFO gpt_neox.layers.9.mlp.dense_h_to_4h
05/07/2023 06:18:02 INFO gpt_neox.layers.10.attention.dense
05/07/2023 06:18:03 INFO gpt_neox.layers.10.attention.query_key_value
05/07/2023 06:18:06 INFO gpt_neox.layers.10.mlp.dense_4h_to_h
05/07/2023 06:18:09 INFO gpt_neox.layers.10.mlp.dense_h_to_4h
05/07/2023 06:18:13 INFO gpt_neox.layers.11.attention.dense
05/07/2023 06:18:14 INFO gpt_neox.layers.11.attention.query_key_value
05/07/2023 06:18:17 INFO gpt_neox.layers.11.mlp.dense_4h_to_h
05/07/2023 06:18:20 INFO gpt_neox.layers.11.mlp.dense_h_to_4h
05/07/2023 06:18:24 INFO gpt_neox.layers.12.attention.dense
05/07/2023 06:18:25 INFO gpt_neox.layers.12.attention.query_key_value
05/07/2023 06:18:28 INFO gpt_neox.layers.12.mlp.dense_4h_to_h
05/07/2023 06:18:31 INFO gpt_neox.layers.12.mlp.dense_h_to_4h
05/07/2023 06:18:35 INFO gpt_neox.layers.13.attention.dense
05/07/2023 06:18:36 INFO gpt_neox.layers.13.attention.query_key_value
05/07/2023 06:18:39 INFO gpt_neox.layers.13.mlp.dense_4h_to_h
05/07/2023 06:18:42 INFO gpt_neox.layers.13.mlp.dense_h_to_4h
05/07/2023 06:18:46 INFO gpt_neox.layers.14.attention.dense
05/07/2023 06:18:47 INFO gpt_neox.layers.14.attention.query_key_value
05/07/2023 06:18:50 INFO gpt_neox.layers.14.mlp.dense_4h_to_h
05/07/2023 06:18:53 INFO gpt_neox.layers.14.mlp.dense_h_to_4h
05/07/2023 06:18:57 INFO gpt_neox.layers.15.attention.dense
05/07/2023 06:18:58 INFO gpt_neox.layers.15.attention.query_key_value
05/07/2023 06:19:01 INFO gpt_neox.layers.15.mlp.dense_4h_to_h
05/07/2023 06:19:04 INFO gpt_neox.layers.15.mlp.dense_h_to_4h
05/07/2023 06:19:08 INFO Model packed.
05/07/2023 06:19:08 WARNING using autotune_warmup will move model to GPU, make sure you have enough VRAM to load the whole model.
05/07/2023 06:19:09 INFO Found 4 unique KN Linear values.
05/07/2023 06:19:09 INFO Warming up autotune cache ...
05/07/2023 06:19:58 INFO Done! Saving..
05/07/2023 06:20:05 INFO Saved. Size of the model file(s): 10063.64 MB
05/07/2023 06:20:05 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:20:05 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:20:06 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:20:06 INFO Found 4 unique KN Linear values.
05/07/2023 06:20:06 INFO Warming up autotune cache ...
05/07/2023 06:20:07 INFO Sample output: ('Because woodchucks (or squirrels, as they\'re also known) are "the chink[e] '
 'of wood."')
05/07/2023 06:20:07 INFO GPU memory usage during test inference: 4.61 GB
05/07/2023 06:20:08 WARNING No such comm: d349e6339e5442e4a3286af931f0699f
05/07/2023 06:20:08 WARNING No such comm: 9374387013794a8bab6ba19cace86d58
05/07/2023 06:20:08 WARNING No such comm: bf152b67bcc04b93863ac311ea4df76a
05/07/2023 06:20:08 WARNING No such comm: 118ccfc8fe874373ae03f8132fb8c258
05/07/2023 06:20:08 WARNING No such comm: 9d85d31e378c44ce9119ead8b83e7556
05/07/2023 06:20:08 WARNING No such comm: c8c5130cae894895a66be12fe834c673
05/07/2023 06:20:08 WARNING No such comm: 237ea212dbd74befad2f34ba2161307d
05/07/2023 06:20:08 WARNING No such comm: 86eda75ae855461b8f5c1ae5b3a83cec
05/07/2023 06:20:08 WARNING No such comm: 63731c4e51f0433fbf85712c08c3d4bf
05/07/2023 06:20:08 WARNING No such comm: 2079a099466341488fc017f30e9359a8
05/07/2023 06:20:08 WARNING No such comm: 99fa75439d3d47c0a6a5b7c25c526718
05/07/2023 06:20:09 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:20:09 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:20:09 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:20:10 INFO Found 4 unique KN Linear values.
05/07/2023 06:20:10 INFO Warming up autotune cache ...
05/07/2023 06:31:04 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:31:04 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:31:04 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:31:05 INFO Found 4 unique KN Linear values.
05/07/2023 06:31:05 INFO Warming up autotune cache ...
05/07/2023 06:31:46 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:31:46 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:31:46 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:31:46 INFO Found 4 unique KN Linear values.
05/07/2023 06:31:46 INFO Warming up autotune cache ...
05/07/2023 06:32:16 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:32:16 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:32:16 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:32:16 INFO Found 4 unique KN Linear values.
05/07/2023 06:32:16 INFO Warming up autotune cache ...
05/07/2023 06:32:42 WARNING use_triton will force moving the whole model to GPU, make sure you have enough VRAM.
05/07/2023 06:32:42 INFO embed_out not been quantized, will be ignored when make_quant.
05/07/2023 06:32:42 WARNING The safetensors archive passed at /home/pszemraj/workspace/misc-train/quantization/quantized-models/stablelm-7b-sft-v7-epoch-3-4bit-128g/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
05/07/2023 06:32:42 INFO Found 4 unique KN Linear values.
05/07/2023 06:32:42 INFO Warming up autotune cache ...