numen-tech commited on
Commit
13b2934
1 Parent(s): a69ab30

Add weights

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
README.md CHANGED
@@ -1,3 +1,17 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  license: llama3.2
 
 
 
3
  ---
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - de
5
+ - fr
6
+ - it
7
+ - pt
8
+ - hi
9
+ - es
10
+ - th
11
  license: llama3.2
12
+ base_model: meta-llama/Llama-3.2-1B-Instruct
13
+ library_name: mlc-llm
14
+ pipeline_tag: text-generation
15
  ---
16
+
17
+ Unquantized (fp16, the parent model is bf16) version of [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) for use with the [Private LLM app](https://privatellm.app/).
ndarray-cache.json ADDED
@@ -0,0 +1,1446 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 98,
4
+ "ParamBytes": 2471628800.0,
5
+ "BitsPerParam": 16.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 525336576,
12
+ "records": [
13
+ {
14
+ "name": "model.embed_tokens.weight",
15
+ "shape": [
16
+ 128256,
17
+ 2048
18
+ ],
19
+ "dtype": "float16",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 525336576,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "5fe6bfddebce8d7687658d3fbaa5a3bb"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 33554432,
31
+ "records": [
32
+ {
33
+ "name": "model.layers.0.mlp.down_proj.weight",
34
+ "shape": [
35
+ 2048,
36
+ 8192
37
+ ],
38
+ "dtype": "float16",
39
+ "format": "f32-to-bf16",
40
+ "nbytes": 33554432,
41
+ "byteOffset": 0
42
+ }
43
+ ],
44
+ "md5sum": "649a3faa52ddd6011b5bf890357144d7"
45
+ },
46
+ {
47
+ "dataPath": "params_shard_2.bin",
48
+ "format": "raw-shard",
49
+ "nbytes": 67108864,
50
+ "records": [
51
+ {
52
+ "name": "model.layers.0.mlp.gate_up_proj.weight",
53
+ "shape": [
54
+ 16384,
55
+ 2048
56
+ ],
57
+ "dtype": "float16",
58
+ "format": "f32-to-bf16",
59
+ "nbytes": 67108864,
60
+ "byteOffset": 0
61
+ }
62
+ ],
63
+ "md5sum": "a146ea08e79066f263468617118d49af"
64
+ },
65
+ {
66
+ "dataPath": "params_shard_3.bin",
67
+ "format": "raw-shard",
68
+ "nbytes": 33554432,
69
+ "records": [
70
+ {
71
+ "name": "model.layers.1.mlp.down_proj.weight",
72
+ "shape": [
73
+ 2048,
74
+ 8192
75
+ ],
76
+ "dtype": "float16",
77
+ "format": "f32-to-bf16",
78
+ "nbytes": 33554432,
79
+ "byteOffset": 0
80
+ }
81
+ ],
82
+ "md5sum": "4773b047a56a813abc31d78bc7ea458b"
83
+ },
84
+ {
85
+ "dataPath": "params_shard_4.bin",
86
+ "format": "raw-shard",
87
+ "nbytes": 67108864,
88
+ "records": [
89
+ {
90
+ "name": "model.layers.1.mlp.gate_up_proj.weight",
91
+ "shape": [
92
+ 16384,
93
+ 2048
94
+ ],
95
+ "dtype": "float16",
96
+ "format": "f32-to-bf16",
97
+ "nbytes": 67108864,
98
+ "byteOffset": 0
99
+ }
100
+ ],
101
+ "md5sum": "757e94d880a0af3916245d1abf2f0cc9"
102
+ },
103
+ {
104
+ "dataPath": "params_shard_5.bin",
105
+ "format": "raw-shard",
106
+ "nbytes": 20987904,
107
+ "records": [
108
+ {
109
+ "name": "model.layers.0.input_layernorm.weight",
110
+ "shape": [
111
+ 2048
112
+ ],
113
+ "dtype": "float16",
114
+ "format": "f32-to-bf16",
115
+ "nbytes": 4096,
116
+ "byteOffset": 0
117
+ },
118
+ {
119
+ "name": "model.layers.0.post_attention_layernorm.weight",
120
+ "shape": [
121
+ 2048
122
+ ],
123
+ "dtype": "float16",
124
+ "format": "f32-to-bf16",
125
+ "nbytes": 4096,
126
+ "byteOffset": 4096
127
+ },
128
+ {
129
+ "name": "model.layers.0.self_attn.qkv_proj.weight",
130
+ "shape": [
131
+ 3072,
132
+ 2048
133
+ ],
134
+ "dtype": "float16",
135
+ "format": "f32-to-bf16",
136
+ "nbytes": 12582912,
137
+ "byteOffset": 8192
138
+ },
139
+ {
140
+ "name": "model.layers.0.self_attn.o_proj.weight",
141
+ "shape": [
142
+ 2048,
143
+ 2048
144
+ ],
145
+ "dtype": "float16",
146
+ "format": "f32-to-bf16",
147
+ "nbytes": 8388608,
148
+ "byteOffset": 12591104
149
+ },
150
+ {
151
+ "name": "model.layers.1.input_layernorm.weight",
152
+ "shape": [
153
+ 2048
154
+ ],
155
+ "dtype": "float16",
156
+ "format": "f32-to-bf16",
157
+ "nbytes": 4096,
158
+ "byteOffset": 20979712
159
+ },
160
+ {
161
+ "name": "model.layers.1.post_attention_layernorm.weight",
162
+ "shape": [
163
+ 2048
164
+ ],
165
+ "dtype": "float16",
166
+ "format": "f32-to-bf16",
167
+ "nbytes": 4096,
168
+ "byteOffset": 20983808
169
+ }
170
+ ],
171
+ "md5sum": "9c81d34101a3a6d5e083c9294d6c072e"
172
+ },
173
+ {
174
+ "dataPath": "params_shard_6.bin",
175
+ "format": "raw-shard",
176
+ "nbytes": 33554432,
177
+ "records": [
178
+ {
179
+ "name": "model.layers.10.mlp.down_proj.weight",
180
+ "shape": [
181
+ 2048,
182
+ 8192
183
+ ],
184
+ "dtype": "float16",
185
+ "format": "f32-to-bf16",
186
+ "nbytes": 33554432,
187
+ "byteOffset": 0
188
+ }
189
+ ],
190
+ "md5sum": "72c909870f091d2288d5f1bbf14227a7"
191
+ },
192
+ {
193
+ "dataPath": "params_shard_7.bin",
194
+ "format": "raw-shard",
195
+ "nbytes": 67108864,
196
+ "records": [
197
+ {
198
+ "name": "model.layers.10.mlp.gate_up_proj.weight",
199
+ "shape": [
200
+ 16384,
201
+ 2048
202
+ ],
203
+ "dtype": "float16",
204
+ "format": "f32-to-bf16",
205
+ "nbytes": 67108864,
206
+ "byteOffset": 0
207
+ }
208
+ ],
209
+ "md5sum": "17d7431f3e56f56b8a4bff1c4ba21b16"
210
+ },
211
+ {
212
+ "dataPath": "params_shard_8.bin",
213
+ "format": "raw-shard",
214
+ "nbytes": 20979712,
215
+ "records": [
216
+ {
217
+ "name": "model.layers.1.self_attn.qkv_proj.weight",
218
+ "shape": [
219
+ 3072,
220
+ 2048
221
+ ],
222
+ "dtype": "float16",
223
+ "format": "f32-to-bf16",
224
+ "nbytes": 12582912,
225
+ "byteOffset": 0
226
+ },
227
+ {
228
+ "name": "model.layers.1.self_attn.o_proj.weight",
229
+ "shape": [
230
+ 2048,
231
+ 2048
232
+ ],
233
+ "dtype": "float16",
234
+ "format": "f32-to-bf16",
235
+ "nbytes": 8388608,
236
+ "byteOffset": 12582912
237
+ },
238
+ {
239
+ "name": "model.layers.10.input_layernorm.weight",
240
+ "shape": [
241
+ 2048
242
+ ],
243
+ "dtype": "float16",
244
+ "format": "f32-to-bf16",
245
+ "nbytes": 4096,
246
+ "byteOffset": 20971520
247
+ },
248
+ {
249
+ "name": "model.layers.10.post_attention_layernorm.weight",
250
+ "shape": [
251
+ 2048
252
+ ],
253
+ "dtype": "float16",
254
+ "format": "f32-to-bf16",
255
+ "nbytes": 4096,
256
+ "byteOffset": 20975616
257
+ }
258
+ ],
259
+ "md5sum": "307158910e9ffcdbea5f1c0fe216793a"
260
+ },
261
+ {
262
+ "dataPath": "params_shard_9.bin",
263
+ "format": "raw-shard",
264
+ "nbytes": 33554432,
265
+ "records": [
266
+ {
267
+ "name": "model.layers.11.mlp.down_proj.weight",
268
+ "shape": [
269
+ 2048,
270
+ 8192
271
+ ],
272
+ "dtype": "float16",
273
+ "format": "f32-to-bf16",
274
+ "nbytes": 33554432,
275
+ "byteOffset": 0
276
+ }
277
+ ],
278
+ "md5sum": "5fb14051ac92a7a55be71a22fff7cd3d"
279
+ },
280
+ {
281
+ "dataPath": "params_shard_10.bin",
282
+ "format": "raw-shard",
283
+ "nbytes": 67108864,
284
+ "records": [
285
+ {
286
+ "name": "model.layers.11.mlp.gate_up_proj.weight",
287
+ "shape": [
288
+ 16384,
289
+ 2048
290
+ ],
291
+ "dtype": "float16",
292
+ "format": "f32-to-bf16",
293
+ "nbytes": 67108864,
294
+ "byteOffset": 0
295
+ }
296
+ ],
297
+ "md5sum": "4df532c18c7c8f280a7586515c561e99"
298
+ },
299
+ {
300
+ "dataPath": "params_shard_11.bin",
301
+ "format": "raw-shard",
302
+ "nbytes": 20979712,
303
+ "records": [
304
+ {
305
+ "name": "model.layers.10.self_attn.qkv_proj.weight",
306
+ "shape": [
307
+ 3072,
308
+ 2048
309
+ ],
310
+ "dtype": "float16",
311
+ "format": "f32-to-bf16",
312
+ "nbytes": 12582912,
313
+ "byteOffset": 0
314
+ },
315
+ {
316
+ "name": "model.layers.10.self_attn.o_proj.weight",
317
+ "shape": [
318
+ 2048,
319
+ 2048
320
+ ],
321
+ "dtype": "float16",
322
+ "format": "f32-to-bf16",
323
+ "nbytes": 8388608,
324
+ "byteOffset": 12582912
325
+ },
326
+ {
327
+ "name": "model.layers.11.input_layernorm.weight",
328
+ "shape": [
329
+ 2048
330
+ ],
331
+ "dtype": "float16",
332
+ "format": "f32-to-bf16",
333
+ "nbytes": 4096,
334
+ "byteOffset": 20971520
335
+ },
336
+ {
337
+ "name": "model.layers.11.post_attention_layernorm.weight",
338
+ "shape": [
339
+ 2048
340
+ ],
341
+ "dtype": "float16",
342
+ "format": "f32-to-bf16",
343
+ "nbytes": 4096,
344
+ "byteOffset": 20975616
345
+ }
346
+ ],
347
+ "md5sum": "039b13862f3f7164be75f5b572aa4228"
348
+ },
349
+ {
350
+ "dataPath": "params_shard_12.bin",
351
+ "format": "raw-shard",
352
+ "nbytes": 33554432,
353
+ "records": [
354
+ {
355
+ "name": "model.layers.12.mlp.down_proj.weight",
356
+ "shape": [
357
+ 2048,
358
+ 8192
359
+ ],
360
+ "dtype": "float16",
361
+ "format": "f32-to-bf16",
362
+ "nbytes": 33554432,
363
+ "byteOffset": 0
364
+ }
365
+ ],
366
+ "md5sum": "c0438cb3f6fc4377f810b81e1bd007c1"
367
+ },
368
+ {
369
+ "dataPath": "params_shard_13.bin",
370
+ "format": "raw-shard",
371
+ "nbytes": 67108864,
372
+ "records": [
373
+ {
374
+ "name": "model.layers.12.mlp.gate_up_proj.weight",
375
+ "shape": [
376
+ 16384,
377
+ 2048
378
+ ],
379
+ "dtype": "float16",
380
+ "format": "f32-to-bf16",
381
+ "nbytes": 67108864,
382
+ "byteOffset": 0
383
+ }
384
+ ],
385
+ "md5sum": "5fd85ab93fe4184608a942ff68c831d5"
386
+ },
387
+ {
388
+ "dataPath": "params_shard_14.bin",
389
+ "format": "raw-shard",
390
+ "nbytes": 20979712,
391
+ "records": [
392
+ {
393
+ "name": "model.layers.11.self_attn.qkv_proj.weight",
394
+ "shape": [
395
+ 3072,
396
+ 2048
397
+ ],
398
+ "dtype": "float16",
399
+ "format": "f32-to-bf16",
400
+ "nbytes": 12582912,
401
+ "byteOffset": 0
402
+ },
403
+ {
404
+ "name": "model.layers.11.self_attn.o_proj.weight",
405
+ "shape": [
406
+ 2048,
407
+ 2048
408
+ ],
409
+ "dtype": "float16",
410
+ "format": "f32-to-bf16",
411
+ "nbytes": 8388608,
412
+ "byteOffset": 12582912
413
+ },
414
+ {
415
+ "name": "model.layers.12.input_layernorm.weight",
416
+ "shape": [
417
+ 2048
418
+ ],
419
+ "dtype": "float16",
420
+ "format": "f32-to-bf16",
421
+ "nbytes": 4096,
422
+ "byteOffset": 20971520
423
+ },
424
+ {
425
+ "name": "model.layers.12.post_attention_layernorm.weight",
426
+ "shape": [
427
+ 2048
428
+ ],
429
+ "dtype": "float16",
430
+ "format": "f32-to-bf16",
431
+ "nbytes": 4096,
432
+ "byteOffset": 20975616
433
+ }
434
+ ],
435
+ "md5sum": "ebdea31540cce36afe9450214d1bfd3d"
436
+ },
437
+ {
438
+ "dataPath": "params_shard_15.bin",
439
+ "format": "raw-shard",
440
+ "nbytes": 33554432,
441
+ "records": [
442
+ {
443
+ "name": "model.layers.13.mlp.down_proj.weight",
444
+ "shape": [
445
+ 2048,
446
+ 8192
447
+ ],
448
+ "dtype": "float16",
449
+ "format": "f32-to-bf16",
450
+ "nbytes": 33554432,
451
+ "byteOffset": 0
452
+ }
453
+ ],
454
+ "md5sum": "9d302afd4a80e8a0548c36f3991d9ad5"
455
+ },
456
+ {
457
+ "dataPath": "params_shard_16.bin",
458
+ "format": "raw-shard",
459
+ "nbytes": 67108864,
460
+ "records": [
461
+ {
462
+ "name": "model.layers.13.mlp.gate_up_proj.weight",
463
+ "shape": [
464
+ 16384,
465
+ 2048
466
+ ],
467
+ "dtype": "float16",
468
+ "format": "f32-to-bf16",
469
+ "nbytes": 67108864,
470
+ "byteOffset": 0
471
+ }
472
+ ],
473
+ "md5sum": "cad57947b90650ef9d736f7a55790e9f"
474
+ },
475
+ {
476
+ "dataPath": "params_shard_17.bin",
477
+ "format": "raw-shard",
478
+ "nbytes": 20979712,
479
+ "records": [
480
+ {
481
+ "name": "model.layers.12.self_attn.qkv_proj.weight",
482
+ "shape": [
483
+ 3072,
484
+ 2048
485
+ ],
486
+ "dtype": "float16",
487
+ "format": "f32-to-bf16",
488
+ "nbytes": 12582912,
489
+ "byteOffset": 0
490
+ },
491
+ {
492
+ "name": "model.layers.12.self_attn.o_proj.weight",
493
+ "shape": [
494
+ 2048,
495
+ 2048
496
+ ],
497
+ "dtype": "float16",
498
+ "format": "f32-to-bf16",
499
+ "nbytes": 8388608,
500
+ "byteOffset": 12582912
501
+ },
502
+ {
503
+ "name": "model.layers.13.input_layernorm.weight",
504
+ "shape": [
505
+ 2048
506
+ ],
507
+ "dtype": "float16",
508
+ "format": "f32-to-bf16",
509
+ "nbytes": 4096,
510
+ "byteOffset": 20971520
511
+ },
512
+ {
513
+ "name": "model.layers.13.post_attention_layernorm.weight",
514
+ "shape": [
515
+ 2048
516
+ ],
517
+ "dtype": "float16",
518
+ "format": "f32-to-bf16",
519
+ "nbytes": 4096,
520
+ "byteOffset": 20975616
521
+ }
522
+ ],
523
+ "md5sum": "66158059d9c393abcac5586ac8de7cf1"
524
+ },
525
+ {
526
+ "dataPath": "params_shard_18.bin",
527
+ "format": "raw-shard",
528
+ "nbytes": 33554432,
529
+ "records": [
530
+ {
531
+ "name": "model.layers.14.mlp.down_proj.weight",
532
+ "shape": [
533
+ 2048,
534
+ 8192
535
+ ],
536
+ "dtype": "float16",
537
+ "format": "f32-to-bf16",
538
+ "nbytes": 33554432,
539
+ "byteOffset": 0
540
+ }
541
+ ],
542
+ "md5sum": "0c8824c203e4055c1d1540faab369234"
543
+ },
544
+ {
545
+ "dataPath": "params_shard_19.bin",
546
+ "format": "raw-shard",
547
+ "nbytes": 67108864,
548
+ "records": [
549
+ {
550
+ "name": "model.layers.14.mlp.gate_up_proj.weight",
551
+ "shape": [
552
+ 16384,
553
+ 2048
554
+ ],
555
+ "dtype": "float16",
556
+ "format": "f32-to-bf16",
557
+ "nbytes": 67108864,
558
+ "byteOffset": 0
559
+ }
560
+ ],
561
+ "md5sum": "317da7cb6758a90a6c45d597e278e5d1"
562
+ },
563
+ {
564
+ "dataPath": "params_shard_20.bin",
565
+ "format": "raw-shard",
566
+ "nbytes": 20979712,
567
+ "records": [
568
+ {
569
+ "name": "model.layers.13.self_attn.qkv_proj.weight",
570
+ "shape": [
571
+ 3072,
572
+ 2048
573
+ ],
574
+ "dtype": "float16",
575
+ "format": "f32-to-bf16",
576
+ "nbytes": 12582912,
577
+ "byteOffset": 0
578
+ },
579
+ {
580
+ "name": "model.layers.13.self_attn.o_proj.weight",
581
+ "shape": [
582
+ 2048,
583
+ 2048
584
+ ],
585
+ "dtype": "float16",
586
+ "format": "f32-to-bf16",
587
+ "nbytes": 8388608,
588
+ "byteOffset": 12582912
589
+ },
590
+ {
591
+ "name": "model.layers.14.input_layernorm.weight",
592
+ "shape": [
593
+ 2048
594
+ ],
595
+ "dtype": "float16",
596
+ "format": "f32-to-bf16",
597
+ "nbytes": 4096,
598
+ "byteOffset": 20971520
599
+ },
600
+ {
601
+ "name": "model.layers.14.post_attention_layernorm.weight",
602
+ "shape": [
603
+ 2048
604
+ ],
605
+ "dtype": "float16",
606
+ "format": "f32-to-bf16",
607
+ "nbytes": 4096,
608
+ "byteOffset": 20975616
609
+ }
610
+ ],
611
+ "md5sum": "6c577fbecc01f72992423567b0bd1481"
612
+ },
613
+ {
614
+ "dataPath": "params_shard_21.bin",
615
+ "format": "raw-shard",
616
+ "nbytes": 33554432,
617
+ "records": [
618
+ {
619
+ "name": "model.layers.15.mlp.down_proj.weight",
620
+ "shape": [
621
+ 2048,
622
+ 8192
623
+ ],
624
+ "dtype": "float16",
625
+ "format": "f32-to-bf16",
626
+ "nbytes": 33554432,
627
+ "byteOffset": 0
628
+ }
629
+ ],
630
+ "md5sum": "7ec87a42eb0dc91c9813a4e012b20c2f"
631
+ },
632
+ {
633
+ "dataPath": "params_shard_22.bin",
634
+ "format": "raw-shard",
635
+ "nbytes": 67108864,
636
+ "records": [
637
+ {
638
+ "name": "model.layers.15.mlp.gate_up_proj.weight",
639
+ "shape": [
640
+ 16384,
641
+ 2048
642
+ ],
643
+ "dtype": "float16",
644
+ "format": "f32-to-bf16",
645
+ "nbytes": 67108864,
646
+ "byteOffset": 0
647
+ }
648
+ ],
649
+ "md5sum": "70f847da51c9aec779ef299f8077afb4"
650
+ },
651
+ {
652
+ "dataPath": "params_shard_23.bin",
653
+ "format": "raw-shard",
654
+ "nbytes": 20979712,
655
+ "records": [
656
+ {
657
+ "name": "model.layers.14.self_attn.qkv_proj.weight",
658
+ "shape": [
659
+ 3072,
660
+ 2048
661
+ ],
662
+ "dtype": "float16",
663
+ "format": "f32-to-bf16",
664
+ "nbytes": 12582912,
665
+ "byteOffset": 0
666
+ },
667
+ {
668
+ "name": "model.layers.14.self_attn.o_proj.weight",
669
+ "shape": [
670
+ 2048,
671
+ 2048
672
+ ],
673
+ "dtype": "float16",
674
+ "format": "f32-to-bf16",
675
+ "nbytes": 8388608,
676
+ "byteOffset": 12582912
677
+ },
678
+ {
679
+ "name": "model.layers.15.input_layernorm.weight",
680
+ "shape": [
681
+ 2048
682
+ ],
683
+ "dtype": "float16",
684
+ "format": "f32-to-bf16",
685
+ "nbytes": 4096,
686
+ "byteOffset": 20971520
687
+ },
688
+ {
689
+ "name": "model.layers.15.post_attention_layernorm.weight",
690
+ "shape": [
691
+ 2048
692
+ ],
693
+ "dtype": "float16",
694
+ "format": "f32-to-bf16",
695
+ "nbytes": 4096,
696
+ "byteOffset": 20975616
697
+ }
698
+ ],
699
+ "md5sum": "6026821264760d740b1409c0bffa1deb"
700
+ },
701
+ {
702
+ "dataPath": "params_shard_24.bin",
703
+ "format": "raw-shard",
704
+ "nbytes": 33554432,
705
+ "records": [
706
+ {
707
+ "name": "model.layers.2.mlp.down_proj.weight",
708
+ "shape": [
709
+ 2048,
710
+ 8192
711
+ ],
712
+ "dtype": "float16",
713
+ "format": "f32-to-bf16",
714
+ "nbytes": 33554432,
715
+ "byteOffset": 0
716
+ }
717
+ ],
718
+ "md5sum": "39d8449ebda4b408ee6959645b49994f"
719
+ },
720
+ {
721
+ "dataPath": "params_shard_25.bin",
722
+ "format": "raw-shard",
723
+ "nbytes": 67108864,
724
+ "records": [
725
+ {
726
+ "name": "model.layers.2.mlp.gate_up_proj.weight",
727
+ "shape": [
728
+ 16384,
729
+ 2048
730
+ ],
731
+ "dtype": "float16",
732
+ "format": "f32-to-bf16",
733
+ "nbytes": 67108864,
734
+ "byteOffset": 0
735
+ }
736
+ ],
737
+ "md5sum": "807a9c5343769339cb9624ae75ed216a"
738
+ },
739
+ {
740
+ "dataPath": "params_shard_26.bin",
741
+ "format": "raw-shard",
742
+ "nbytes": 20979712,
743
+ "records": [
744
+ {
745
+ "name": "model.layers.15.self_attn.qkv_proj.weight",
746
+ "shape": [
747
+ 3072,
748
+ 2048
749
+ ],
750
+ "dtype": "float16",
751
+ "format": "f32-to-bf16",
752
+ "nbytes": 12582912,
753
+ "byteOffset": 0
754
+ },
755
+ {
756
+ "name": "model.layers.15.self_attn.o_proj.weight",
757
+ "shape": [
758
+ 2048,
759
+ 2048
760
+ ],
761
+ "dtype": "float16",
762
+ "format": "f32-to-bf16",
763
+ "nbytes": 8388608,
764
+ "byteOffset": 12582912
765
+ },
766
+ {
767
+ "name": "model.layers.2.input_layernorm.weight",
768
+ "shape": [
769
+ 2048
770
+ ],
771
+ "dtype": "float16",
772
+ "format": "f32-to-bf16",
773
+ "nbytes": 4096,
774
+ "byteOffset": 20971520
775
+ },
776
+ {
777
+ "name": "model.layers.2.post_attention_layernorm.weight",
778
+ "shape": [
779
+ 2048
780
+ ],
781
+ "dtype": "float16",
782
+ "format": "f32-to-bf16",
783
+ "nbytes": 4096,
784
+ "byteOffset": 20975616
785
+ }
786
+ ],
787
+ "md5sum": "77329a20b587c214dd987314bdfead17"
788
+ },
789
+ {
790
+ "dataPath": "params_shard_27.bin",
791
+ "format": "raw-shard",
792
+ "nbytes": 33554432,
793
+ "records": [
794
+ {
795
+ "name": "model.layers.3.mlp.down_proj.weight",
796
+ "shape": [
797
+ 2048,
798
+ 8192
799
+ ],
800
+ "dtype": "float16",
801
+ "format": "f32-to-bf16",
802
+ "nbytes": 33554432,
803
+ "byteOffset": 0
804
+ }
805
+ ],
806
+ "md5sum": "38a444decbac4826f46fcfe12380e15e"
807
+ },
808
+ {
809
+ "dataPath": "params_shard_28.bin",
810
+ "format": "raw-shard",
811
+ "nbytes": 67108864,
812
+ "records": [
813
+ {
814
+ "name": "model.layers.3.mlp.gate_up_proj.weight",
815
+ "shape": [
816
+ 16384,
817
+ 2048
818
+ ],
819
+ "dtype": "float16",
820
+ "format": "f32-to-bf16",
821
+ "nbytes": 67108864,
822
+ "byteOffset": 0
823
+ }
824
+ ],
825
+ "md5sum": "0f07125504c8a77b1cc9f36cdb0b8022"
826
+ },
827
+ {
828
+ "dataPath": "params_shard_29.bin",
829
+ "format": "raw-shard",
830
+ "nbytes": 20979712,
831
+ "records": [
832
+ {
833
+ "name": "model.layers.2.self_attn.qkv_proj.weight",
834
+ "shape": [
835
+ 3072,
836
+ 2048
837
+ ],
838
+ "dtype": "float16",
839
+ "format": "f32-to-bf16",
840
+ "nbytes": 12582912,
841
+ "byteOffset": 0
842
+ },
843
+ {
844
+ "name": "model.layers.2.self_attn.o_proj.weight",
845
+ "shape": [
846
+ 2048,
847
+ 2048
848
+ ],
849
+ "dtype": "float16",
850
+ "format": "f32-to-bf16",
851
+ "nbytes": 8388608,
852
+ "byteOffset": 12582912
853
+ },
854
+ {
855
+ "name": "model.layers.3.input_layernorm.weight",
856
+ "shape": [
857
+ 2048
858
+ ],
859
+ "dtype": "float16",
860
+ "format": "f32-to-bf16",
861
+ "nbytes": 4096,
862
+ "byteOffset": 20971520
863
+ },
864
+ {
865
+ "name": "model.layers.3.post_attention_layernorm.weight",
866
+ "shape": [
867
+ 2048
868
+ ],
869
+ "dtype": "float16",
870
+ "format": "f32-to-bf16",
871
+ "nbytes": 4096,
872
+ "byteOffset": 20975616
873
+ }
874
+ ],
875
+ "md5sum": "2d1357a7be66e40adb43693a389f015e"
876
+ },
877
+ {
878
+ "dataPath": "params_shard_30.bin",
879
+ "format": "raw-shard",
880
+ "nbytes": 33554432,
881
+ "records": [
882
+ {
883
+ "name": "model.layers.4.mlp.down_proj.weight",
884
+ "shape": [
885
+ 2048,
886
+ 8192
887
+ ],
888
+ "dtype": "float16",
889
+ "format": "f32-to-bf16",
890
+ "nbytes": 33554432,
891
+ "byteOffset": 0
892
+ }
893
+ ],
894
+ "md5sum": "4a2a4bb5a43a017cba209f668cd87347"
895
+ },
896
+ {
897
+ "dataPath": "params_shard_31.bin",
898
+ "format": "raw-shard",
899
+ "nbytes": 67108864,
900
+ "records": [
901
+ {
902
+ "name": "model.layers.4.mlp.gate_up_proj.weight",
903
+ "shape": [
904
+ 16384,
905
+ 2048
906
+ ],
907
+ "dtype": "float16",
908
+ "format": "f32-to-bf16",
909
+ "nbytes": 67108864,
910
+ "byteOffset": 0
911
+ }
912
+ ],
913
+ "md5sum": "e862a436e391ac68e461ef9667250725"
914
+ },
915
+ {
916
+ "dataPath": "params_shard_32.bin",
917
+ "format": "raw-shard",
918
+ "nbytes": 20979712,
919
+ "records": [
920
+ {
921
+ "name": "model.layers.3.self_attn.qkv_proj.weight",
922
+ "shape": [
923
+ 3072,
924
+ 2048
925
+ ],
926
+ "dtype": "float16",
927
+ "format": "f32-to-bf16",
928
+ "nbytes": 12582912,
929
+ "byteOffset": 0
930
+ },
931
+ {
932
+ "name": "model.layers.3.self_attn.o_proj.weight",
933
+ "shape": [
934
+ 2048,
935
+ 2048
936
+ ],
937
+ "dtype": "float16",
938
+ "format": "f32-to-bf16",
939
+ "nbytes": 8388608,
940
+ "byteOffset": 12582912
941
+ },
942
+ {
943
+ "name": "model.layers.4.input_layernorm.weight",
944
+ "shape": [
945
+ 2048
946
+ ],
947
+ "dtype": "float16",
948
+ "format": "f32-to-bf16",
949
+ "nbytes": 4096,
950
+ "byteOffset": 20971520
951
+ },
952
+ {
953
+ "name": "model.layers.4.post_attention_layernorm.weight",
954
+ "shape": [
955
+ 2048
956
+ ],
957
+ "dtype": "float16",
958
+ "format": "f32-to-bf16",
959
+ "nbytes": 4096,
960
+ "byteOffset": 20975616
961
+ }
962
+ ],
963
+ "md5sum": "e59f320408694ca1caca6b143ad0d63e"
964
+ },
965
+ {
966
+ "dataPath": "params_shard_33.bin",
967
+ "format": "raw-shard",
968
+ "nbytes": 33554432,
969
+ "records": [
970
+ {
971
+ "name": "model.layers.5.mlp.down_proj.weight",
972
+ "shape": [
973
+ 2048,
974
+ 8192
975
+ ],
976
+ "dtype": "float16",
977
+ "format": "f32-to-bf16",
978
+ "nbytes": 33554432,
979
+ "byteOffset": 0
980
+ }
981
+ ],
982
+ "md5sum": "f1f7264cb6daee8fadf650ffe5483d85"
983
+ },
984
+ {
985
+ "dataPath": "params_shard_34.bin",
986
+ "format": "raw-shard",
987
+ "nbytes": 67108864,
988
+ "records": [
989
+ {
990
+ "name": "model.layers.5.mlp.gate_up_proj.weight",
991
+ "shape": [
992
+ 16384,
993
+ 2048
994
+ ],
995
+ "dtype": "float16",
996
+ "format": "f32-to-bf16",
997
+ "nbytes": 67108864,
998
+ "byteOffset": 0
999
+ }
1000
+ ],
1001
+ "md5sum": "c128f1b4cdcbee7d1fe0c440478735fa"
1002
+ },
1003
+ {
1004
+ "dataPath": "params_shard_35.bin",
1005
+ "format": "raw-shard",
1006
+ "nbytes": 20979712,
1007
+ "records": [
1008
+ {
1009
+ "name": "model.layers.4.self_attn.qkv_proj.weight",
1010
+ "shape": [
1011
+ 3072,
1012
+ 2048
1013
+ ],
1014
+ "dtype": "float16",
1015
+ "format": "f32-to-bf16",
1016
+ "nbytes": 12582912,
1017
+ "byteOffset": 0
1018
+ },
1019
+ {
1020
+ "name": "model.layers.4.self_attn.o_proj.weight",
1021
+ "shape": [
1022
+ 2048,
1023
+ 2048
1024
+ ],
1025
+ "dtype": "float16",
1026
+ "format": "f32-to-bf16",
1027
+ "nbytes": 8388608,
1028
+ "byteOffset": 12582912
1029
+ },
1030
+ {
1031
+ "name": "model.layers.5.input_layernorm.weight",
1032
+ "shape": [
1033
+ 2048
1034
+ ],
1035
+ "dtype": "float16",
1036
+ "format": "f32-to-bf16",
1037
+ "nbytes": 4096,
1038
+ "byteOffset": 20971520
1039
+ },
1040
+ {
1041
+ "name": "model.layers.5.post_attention_layernorm.weight",
1042
+ "shape": [
1043
+ 2048
1044
+ ],
1045
+ "dtype": "float16",
1046
+ "format": "f32-to-bf16",
1047
+ "nbytes": 4096,
1048
+ "byteOffset": 20975616
1049
+ }
1050
+ ],
1051
+ "md5sum": "0a61d05b6358398e961f11da4841ebf7"
1052
+ },
1053
+ {
1054
+ "dataPath": "params_shard_36.bin",
1055
+ "format": "raw-shard",
1056
+ "nbytes": 33554432,
1057
+ "records": [
1058
+ {
1059
+ "name": "model.layers.6.mlp.down_proj.weight",
1060
+ "shape": [
1061
+ 2048,
1062
+ 8192
1063
+ ],
1064
+ "dtype": "float16",
1065
+ "format": "f32-to-bf16",
1066
+ "nbytes": 33554432,
1067
+ "byteOffset": 0
1068
+ }
1069
+ ],
1070
+ "md5sum": "7d96d469a438acd143d71bdb8ae11053"
1071
+ },
1072
+ {
1073
+ "dataPath": "params_shard_37.bin",
1074
+ "format": "raw-shard",
1075
+ "nbytes": 67108864,
1076
+ "records": [
1077
+ {
1078
+ "name": "model.layers.6.mlp.gate_up_proj.weight",
1079
+ "shape": [
1080
+ 16384,
1081
+ 2048
1082
+ ],
1083
+ "dtype": "float16",
1084
+ "format": "f32-to-bf16",
1085
+ "nbytes": 67108864,
1086
+ "byteOffset": 0
1087
+ }
1088
+ ],
1089
+ "md5sum": "940d4b77e813c39131c8b190a96afbdc"
1090
+ },
1091
+ {
1092
+ "dataPath": "params_shard_38.bin",
1093
+ "format": "raw-shard",
1094
+ "nbytes": 20979712,
1095
+ "records": [
1096
+ {
1097
+ "name": "model.layers.5.self_attn.qkv_proj.weight",
1098
+ "shape": [
1099
+ 3072,
1100
+ 2048
1101
+ ],
1102
+ "dtype": "float16",
1103
+ "format": "f32-to-bf16",
1104
+ "nbytes": 12582912,
1105
+ "byteOffset": 0
1106
+ },
1107
+ {
1108
+ "name": "model.layers.5.self_attn.o_proj.weight",
1109
+ "shape": [
1110
+ 2048,
1111
+ 2048
1112
+ ],
1113
+ "dtype": "float16",
1114
+ "format": "f32-to-bf16",
1115
+ "nbytes": 8388608,
1116
+ "byteOffset": 12582912
1117
+ },
1118
+ {
1119
+ "name": "model.layers.6.input_layernorm.weight",
1120
+ "shape": [
1121
+ 2048
1122
+ ],
1123
+ "dtype": "float16",
1124
+ "format": "f32-to-bf16",
1125
+ "nbytes": 4096,
1126
+ "byteOffset": 20971520
1127
+ },
1128
+ {
1129
+ "name": "model.layers.6.post_attention_layernorm.weight",
1130
+ "shape": [
1131
+ 2048
1132
+ ],
1133
+ "dtype": "float16",
1134
+ "format": "f32-to-bf16",
1135
+ "nbytes": 4096,
1136
+ "byteOffset": 20975616
1137
+ }
1138
+ ],
1139
+ "md5sum": "d1c8264837f5511329f39ef0c5e12cca"
1140
+ },
1141
+ {
1142
+ "dataPath": "params_shard_39.bin",
1143
+ "format": "raw-shard",
1144
+ "nbytes": 33554432,
1145
+ "records": [
1146
+ {
1147
+ "name": "model.layers.7.mlp.down_proj.weight",
1148
+ "shape": [
1149
+ 2048,
1150
+ 8192
1151
+ ],
1152
+ "dtype": "float16",
1153
+ "format": "f32-to-bf16",
1154
+ "nbytes": 33554432,
1155
+ "byteOffset": 0
1156
+ }
1157
+ ],
1158
+ "md5sum": "6b6ba1195aa2a5d584dbf22d96cbc780"
1159
+ },
1160
+ {
1161
+ "dataPath": "params_shard_40.bin",
1162
+ "format": "raw-shard",
1163
+ "nbytes": 67108864,
1164
+ "records": [
1165
+ {
1166
+ "name": "model.layers.7.mlp.gate_up_proj.weight",
1167
+ "shape": [
1168
+ 16384,
1169
+ 2048
1170
+ ],
1171
+ "dtype": "float16",
1172
+ "format": "f32-to-bf16",
1173
+ "nbytes": 67108864,
1174
+ "byteOffset": 0
1175
+ }
1176
+ ],
1177
+ "md5sum": "0fe783a7fa4365570b61f40b1d02e992"
1178
+ },
1179
+ {
1180
+ "dataPath": "params_shard_41.bin",
1181
+ "format": "raw-shard",
1182
+ "nbytes": 20979712,
1183
+ "records": [
1184
+ {
1185
+ "name": "model.layers.6.self_attn.qkv_proj.weight",
1186
+ "shape": [
1187
+ 3072,
1188
+ 2048
1189
+ ],
1190
+ "dtype": "float16",
1191
+ "format": "f32-to-bf16",
1192
+ "nbytes": 12582912,
1193
+ "byteOffset": 0
1194
+ },
1195
+ {
1196
+ "name": "model.layers.6.self_attn.o_proj.weight",
1197
+ "shape": [
1198
+ 2048,
1199
+ 2048
1200
+ ],
1201
+ "dtype": "float16",
1202
+ "format": "f32-to-bf16",
1203
+ "nbytes": 8388608,
1204
+ "byteOffset": 12582912
1205
+ },
1206
+ {
1207
+ "name": "model.layers.7.input_layernorm.weight",
1208
+ "shape": [
1209
+ 2048
1210
+ ],
1211
+ "dtype": "float16",
1212
+ "format": "f32-to-bf16",
1213
+ "nbytes": 4096,
1214
+ "byteOffset": 20971520
1215
+ },
1216
+ {
1217
+ "name": "model.layers.7.post_attention_layernorm.weight",
1218
+ "shape": [
1219
+ 2048
1220
+ ],
1221
+ "dtype": "float16",
1222
+ "format": "f32-to-bf16",
1223
+ "nbytes": 4096,
1224
+ "byteOffset": 20975616
1225
+ }
1226
+ ],
1227
+ "md5sum": "3d5c6266a9e3be363d7b1636aed7acab"
1228
+ },
1229
+ {
1230
+ "dataPath": "params_shard_42.bin",
1231
+ "format": "raw-shard",
1232
+ "nbytes": 33554432,
1233
+ "records": [
1234
+ {
1235
+ "name": "model.layers.8.mlp.down_proj.weight",
1236
+ "shape": [
1237
+ 2048,
1238
+ 8192
1239
+ ],
1240
+ "dtype": "float16",
1241
+ "format": "f32-to-bf16",
1242
+ "nbytes": 33554432,
1243
+ "byteOffset": 0
1244
+ }
1245
+ ],
1246
+ "md5sum": "6528032fc98667817d9bc4f8e22a778f"
1247
+ },
1248
+ {
1249
+ "dataPath": "params_shard_43.bin",
1250
+ "format": "raw-shard",
1251
+ "nbytes": 67108864,
1252
+ "records": [
1253
+ {
1254
+ "name": "model.layers.8.mlp.gate_up_proj.weight",
1255
+ "shape": [
1256
+ 16384,
1257
+ 2048
1258
+ ],
1259
+ "dtype": "float16",
1260
+ "format": "f32-to-bf16",
1261
+ "nbytes": 67108864,
1262
+ "byteOffset": 0
1263
+ }
1264
+ ],
1265
+ "md5sum": "f8f919204af1a388d05201e6c2964dd2"
1266
+ },
1267
+ {
1268
+ "dataPath": "params_shard_44.bin",
1269
+ "format": "raw-shard",
1270
+ "nbytes": 20979712,
1271
+ "records": [
1272
+ {
1273
+ "name": "model.layers.7.self_attn.qkv_proj.weight",
1274
+ "shape": [
1275
+ 3072,
1276
+ 2048
1277
+ ],
1278
+ "dtype": "float16",
1279
+ "format": "f32-to-bf16",
1280
+ "nbytes": 12582912,
1281
+ "byteOffset": 0
1282
+ },
1283
+ {
1284
+ "name": "model.layers.7.self_attn.o_proj.weight",
1285
+ "shape": [
1286
+ 2048,
1287
+ 2048
1288
+ ],
1289
+ "dtype": "float16",
1290
+ "format": "f32-to-bf16",
1291
+ "nbytes": 8388608,
1292
+ "byteOffset": 12582912
1293
+ },
1294
+ {
1295
+ "name": "model.layers.8.input_layernorm.weight",
1296
+ "shape": [
1297
+ 2048
1298
+ ],
1299
+ "dtype": "float16",
1300
+ "format": "f32-to-bf16",
1301
+ "nbytes": 4096,
1302
+ "byteOffset": 20971520
1303
+ },
1304
+ {
1305
+ "name": "model.layers.8.post_attention_layernorm.weight",
1306
+ "shape": [
1307
+ 2048
1308
+ ],
1309
+ "dtype": "float16",
1310
+ "format": "f32-to-bf16",
1311
+ "nbytes": 4096,
1312
+ "byteOffset": 20975616
1313
+ }
1314
+ ],
1315
+ "md5sum": "81b962b29fa9e116e8d210f867f356b7"
1316
+ },
1317
+ {
1318
+ "dataPath": "params_shard_45.bin",
1319
+ "format": "raw-shard",
1320
+ "nbytes": 33554432,
1321
+ "records": [
1322
+ {
1323
+ "name": "model.layers.9.mlp.down_proj.weight",
1324
+ "shape": [
1325
+ 2048,
1326
+ 8192
1327
+ ],
1328
+ "dtype": "float16",
1329
+ "format": "f32-to-bf16",
1330
+ "nbytes": 33554432,
1331
+ "byteOffset": 0
1332
+ }
1333
+ ],
1334
+ "md5sum": "29e8087714358ca3b41b85046a80c13d"
1335
+ },
1336
+ {
1337
+ "dataPath": "params_shard_46.bin",
1338
+ "format": "raw-shard",
1339
+ "nbytes": 67108864,
1340
+ "records": [
1341
+ {
1342
+ "name": "model.layers.9.mlp.gate_up_proj.weight",
1343
+ "shape": [
1344
+ 16384,
1345
+ 2048
1346
+ ],
1347
+ "dtype": "float16",
1348
+ "format": "f32-to-bf16",
1349
+ "nbytes": 67108864,
1350
+ "byteOffset": 0
1351
+ }
1352
+ ],
1353
+ "md5sum": "d56e3e4e2fb83591eeeacad6fa10e9e9"
1354
+ },
1355
+ {
1356
+ "dataPath": "params_shard_47.bin",
1357
+ "format": "raw-shard",
1358
+ "nbytes": 20979712,
1359
+ "records": [
1360
+ {
1361
+ "name": "model.layers.8.self_attn.qkv_proj.weight",
1362
+ "shape": [
1363
+ 3072,
1364
+ 2048
1365
+ ],
1366
+ "dtype": "float16",
1367
+ "format": "f32-to-bf16",
1368
+ "nbytes": 12582912,
1369
+ "byteOffset": 0
1370
+ },
1371
+ {
1372
+ "name": "model.layers.8.self_attn.o_proj.weight",
1373
+ "shape": [
1374
+ 2048,
1375
+ 2048
1376
+ ],
1377
+ "dtype": "float16",
1378
+ "format": "f32-to-bf16",
1379
+ "nbytes": 8388608,
1380
+ "byteOffset": 12582912
1381
+ },
1382
+ {
1383
+ "name": "model.layers.9.input_layernorm.weight",
1384
+ "shape": [
1385
+ 2048
1386
+ ],
1387
+ "dtype": "float16",
1388
+ "format": "f32-to-bf16",
1389
+ "nbytes": 4096,
1390
+ "byteOffset": 20971520
1391
+ },
1392
+ {
1393
+ "name": "model.layers.9.post_attention_layernorm.weight",
1394
+ "shape": [
1395
+ 2048
1396
+ ],
1397
+ "dtype": "float16",
1398
+ "format": "f32-to-bf16",
1399
+ "nbytes": 4096,
1400
+ "byteOffset": 20975616
1401
+ }
1402
+ ],
1403
+ "md5sum": "50b62008e92be3380dbc8b46c48828d2"
1404
+ },
1405
+ {
1406
+ "dataPath": "params_shard_48.bin",
1407
+ "format": "raw-shard",
1408
+ "nbytes": 20975616,
1409
+ "records": [
1410
+ {
1411
+ "name": "model.layers.9.self_attn.qkv_proj.weight",
1412
+ "shape": [
1413
+ 3072,
1414
+ 2048
1415
+ ],
1416
+ "dtype": "float16",
1417
+ "format": "f32-to-bf16",
1418
+ "nbytes": 12582912,
1419
+ "byteOffset": 0
1420
+ },
1421
+ {
1422
+ "name": "model.layers.9.self_attn.o_proj.weight",
1423
+ "shape": [
1424
+ 2048,
1425
+ 2048
1426
+ ],
1427
+ "dtype": "float16",
1428
+ "format": "f32-to-bf16",
1429
+ "nbytes": 8388608,
1430
+ "byteOffset": 12582912
1431
+ },
1432
+ {
1433
+ "name": "model.norm.weight",
1434
+ "shape": [
1435
+ 2048
1436
+ ],
1437
+ "dtype": "float16",
1438
+ "format": "f32-to-bf16",
1439
+ "nbytes": 4096,
1440
+ "byteOffset": 20971520
1441
+ }
1442
+ ],
1443
+ "md5sum": "bdbdfa1c75499c5c12840c1a685e2f44"
1444
+ }
1445
+ ]
1446
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c214e176a8e43329defccb3a6d6e11b75a0ecff5b967a18d898849a103a574d5
3
+ size 525336576
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:700b152607e1b2d3e89de359b5d7ae216383c34077aa02aac6e52477f45eb75d
3
+ size 33554432
params_shard_10.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1089e327f40ec0e5180eb2c85d129bcc4f4708cde820b017fb16eca815d9a363
3
+ size 67108864
params_shard_11.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd25f6fa58c91930df8604e55e08b8875a288a7448a2448c9856da50314bec49
3
+ size 20979712
params_shard_12.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d1a815cea5efd5887c5f2a170c4863d5b4dda283ffda6e977cf559856a0249e
3
+ size 33554432
params_shard_13.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:159d759832bef0f532ffa90d02c7ebe69cc5c57956c2f0abbe87d816e7715396
3
+ size 67108864
params_shard_14.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16475c045a3377a03403e2df89db78f48197961771aac97ea7aeeddae9a82235
3
+ size 20979712
params_shard_15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e6bc762566ab8d9af1226ba8ea958d1d9a1270dfd31058e0eee74ca5922b107
3
+ size 33554432
params_shard_16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6fecae8ad9b966d8aa71f81bb21eaf5c566a1c39b8f095778e7058011e8cf0d
3
+ size 67108864
params_shard_17.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0755b1734fb7ab64efbf03cda3c4876cff5d1bba3820362a98d51740a4b6d6d0
3
+ size 20979712
params_shard_18.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04a6bd1aede5799f69604778ad73e5ee87c25df19665edf32ba8234fa44e1484
3
+ size 33554432
params_shard_19.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb34b1a11a5d6e85bcbf6e496ab2ea4308e2b75fcd36738babd15014e89c1a0d
3
+ size 67108864
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64746fcec71a162538c5ca36ee878cbdce9132687ec7afc730a0824752a02062
3
+ size 67108864
params_shard_20.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2474097267b9a9104fd244ef80753f00c0421a568c88d1a47997b1eea1ffc728
3
+ size 20979712
params_shard_21.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf85e74ac35395aea0040a4a1e2b8f92633b96ed67e9883b8685e83029459e9f
3
+ size 33554432
params_shard_22.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d02a714b039cc4a788bad50c2d37a53286f00f36368b46e8cbe048d0e47872e
3
+ size 67108864
params_shard_23.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9af27a69d587eee0d68b37fd52fe168f4753057a1cc81f28f6e202d3ef3c0fa7
3
+ size 20979712
params_shard_24.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77be486c25e90fde59b9bbeb9650ea377d0d236f1c2ea28eaebb7290e39ad63e
3
+ size 33554432
params_shard_25.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f9e9435ac65354fb971f79effef06b83111a727ff2b20bdbada4fca1133890c
3
+ size 67108864
params_shard_26.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20d3957663bed4d3d811a165f06a9cd6693165acf1c364f5bf275999ddf4c980
3
+ size 20979712
params_shard_27.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd3568b232db3053c9ace9860617f42633545497118172f37d4532d0ec8bf321
3
+ size 33554432
params_shard_28.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e64dec47aaf9e16b0d178082ade7bd81dfc257cbc2bca6f1c5d615da85adfc6
3
+ size 67108864
params_shard_29.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:683daa0c372a06e55d096ecf51deafd9de9c43f998a257cd728e5ee6eef5c935
3
+ size 20979712
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d2d36bb5e0a47709d62d6a8320247bd69b5124ce338739a42a149dac93fac86
3
+ size 33554432
params_shard_30.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a73972b0c1a9eef072850f67b60e5b2c9a545c2a7fa9cccc68549f088be5c8f
3
+ size 33554432
params_shard_31.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc6ca0b3b36e741a9460fca85ef4bcdc7503af2f2bd19ef8b850ef6414b34325
3
+ size 67108864
params_shard_32.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5327e3eb2b35681320139878ef598c22ac3dda0472a0dd895c6070495a00311
3
+ size 20979712
params_shard_33.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91fa7418b540dfff9ecb0e36ac5bdad9fd82ebd3fcac9daca22ea50441b9d1bc
3
+ size 33554432
params_shard_34.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e63b76285dfa11f774a881100afced1e3bb073336ab281b89ff0de75cdda8bf8
3
+ size 67108864
params_shard_35.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6e0b886106b378bee9b8f8444e62da5e13740a12feb07e351c570fb689089ac
3
+ size 20979712
params_shard_36.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:897a4bd69caefa8d7ee3d1df58ef9f93d1e795ebde121c21b5216a6d145879ec
3
+ size 33554432
params_shard_37.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d02eb690bc1987a1f019f5d898cb1fe7c7a793bf50cd60381edad6e09e08c07
3
+ size 67108864
params_shard_38.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8741a53b21b0a364f1ce78c47a1a870dbc183f54e32e47924f60b373ec0c1b82
3
+ size 20979712
params_shard_39.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b8606795181efba208881c2f0b63de512d6bd2ed6eeac72adba8c12aeab4d83
3
+ size 33554432
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b6c2c9adcd35c1da176e6608f4d4fd560d09b800c0f91c16b2eecb3a6fee0ee
3
+ size 67108864
params_shard_40.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc64681d16432fd3b5e7bdb5d5019c178511bd998a01d9f394d64dc121a618ee
3
+ size 67108864
params_shard_41.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56e4a9e65e2c9fec2078d28cacedefeccfbec48828248931cee9058e7c0a03b2
3
+ size 20979712
params_shard_42.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b76e03985cd9ded47cadf01f896da84e5c3bc4f8896390f91c73d5a53aab6482
3
+ size 33554432
params_shard_43.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e043c0a7f0e03d813a87d8f721cedc60174b5926c4d7eafe8644ade3d7d17f2
3
+ size 67108864
params_shard_44.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7240a872629d1c036f03cdad66a2d7f0257139d375cdb810b27167e53669764b
3
+ size 20979712
params_shard_45.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60898c873aa6dbbfaba9ca6337001665365091484147a44c20828e6ce1edc583
3
+ size 33554432
params_shard_46.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25475ec182dd2cc867e15d5798285cd2f7fd5717dba03ecd574be0a6359c03d1
3
+ size 67108864
params_shard_47.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18ce44b582b6f2dfa109e1936417e985b01acd149f2dd93f6a39e68872005827
3
+ size 20979712
params_shard_48.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4408107b52beffcdce5cf0c7d4e06343e5d374afebbdf0135f6841008bf1111f
3
+ size 20975616
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0ba0a8aa0d4df0f87e3d28a6a9bd87963a951ccdce142ee7fee6606fd7a404d
3
+ size 20987904
params_shard_6.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dcc01d9dfbd65023ea065cc21b9824b344e7dc55e32ba401f9e733a55034f1f0
3
+ size 33554432
params_shard_7.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f09f23913b26ab368405a8069bf274db56adce3c36e330c1e930e9e421eafe47
3
+ size 67108864
params_shard_8.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c353c67b5f35e707862747d69cae85bd6b292caa324bf2ef2e71614f97047c1f
3
+ size 20979712