TearGosling commited on
Commit
db7514e
·
verified ·
1 Parent(s): db61101

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,647 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: []
3
+ library_name: transformers
4
+ tags:
5
+ - mergekit
6
+ - merge
7
+
8
+ ---
9
+ # evo_model_test
10
+
11
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
+
13
+ ## Merge Details
14
+ ### Merge Method
15
+
16
+ This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using ./evolve_merges/input_models/merge-10162024_972739363 as a base.
17
+
18
+ ### Models Merged
19
+
20
+ The following models were included in the merge:
21
+ * ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
22
+ * ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
23
+ * ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
24
+
25
+ ### Configuration
26
+
27
+ The following YAML configuration was used to produce this model:
28
+
29
+ ```yaml
30
+ base_model: ./evolve_merges/input_models/merge-10162024_972739363
31
+ dtype: bfloat16
32
+ merge_method: dare_ties
33
+ parameters:
34
+ int8_mask: 1.0
35
+ normalize: 1.0
36
+ slices:
37
+ - sources:
38
+ - layer_range: [0, 4]
39
+ model: ./evolve_merges/input_models/merge-10162024_972739363
40
+ parameters:
41
+ density:
42
+ - filter: self_attn
43
+ value: 0.6617851833521375
44
+ - filter: mlp
45
+ value: 1.0
46
+ - value: 0.7758506135029611
47
+ weight:
48
+ - filter: self_attn
49
+ value: 0.06553850894305135
50
+ - filter: mlp
51
+ value: 0.32372893196093133
52
+ - value: 0.24761893893703177
53
+ - layer_range: [0, 4]
54
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
55
+ parameters:
56
+ density:
57
+ - filter: self_attn
58
+ value: 0.8619096186212604
59
+ - filter: mlp
60
+ value: 0.9632945037149085
61
+ - value: 1.0
62
+ weight:
63
+ - filter: self_attn
64
+ value: 0.5496368676404241
65
+ - filter: mlp
66
+ value: 0.2817627768141395
67
+ - value: 0.2831242003449033
68
+ - layer_range: [0, 4]
69
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
70
+ parameters:
71
+ density:
72
+ - filter: self_attn
73
+ value: 1.0
74
+ - filter: mlp
75
+ value: 1.0
76
+ - value: 0.9238831652008582
77
+ weight:
78
+ - filter: self_attn
79
+ value: 0.6983534009784523
80
+ - filter: mlp
81
+ value: 0.7786486269006042
82
+ - value: 0.3362711484417948
83
+ - layer_range: [0, 4]
84
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
85
+ parameters:
86
+ density:
87
+ - filter: self_attn
88
+ value: 1.0
89
+ - filter: mlp
90
+ value: 1.0
91
+ - value: 0.897712174766424
92
+ weight:
93
+ - filter: self_attn
94
+ value: 0.6494468053120542
95
+ - filter: mlp
96
+ value: 0.11769817501358182
97
+ - value: 0.23745407940550356
98
+ - sources:
99
+ - layer_range: [4, 8]
100
+ model: ./evolve_merges/input_models/merge-10162024_972739363
101
+ parameters:
102
+ density:
103
+ - filter: self_attn
104
+ value: 0.768056839478356
105
+ - filter: mlp
106
+ value: 0.7392675781352855
107
+ - value: 1.0
108
+ weight:
109
+ - filter: self_attn
110
+ value: 0.4137398667324908
111
+ - filter: mlp
112
+ value: 0.5364761127195374
113
+ - value: -0.06120952450996993
114
+ - layer_range: [4, 8]
115
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
116
+ parameters:
117
+ density:
118
+ - filter: self_attn
119
+ value: 0.9328263901133284
120
+ - filter: mlp
121
+ value: 1.0
122
+ - value: 1.0
123
+ weight:
124
+ - filter: self_attn
125
+ value: 0.512662918449004
126
+ - filter: mlp
127
+ value: 0.8133160093541117
128
+ - value: 0.09518477923218693
129
+ - layer_range: [4, 8]
130
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
131
+ parameters:
132
+ density:
133
+ - filter: self_attn
134
+ value: 1.0
135
+ - filter: mlp
136
+ value: 1.0
137
+ - value: 1.0
138
+ weight:
139
+ - filter: self_attn
140
+ value: 0.6534355737222919
141
+ - filter: mlp
142
+ value: -0.2733724467069448
143
+ - value: 0.35896371241039604
144
+ - layer_range: [4, 8]
145
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
146
+ parameters:
147
+ density:
148
+ - filter: self_attn
149
+ value: 1.0
150
+ - filter: mlp
151
+ value: 0.9645408518441749
152
+ - value: 0.9920721804462888
153
+ weight:
154
+ - filter: self_attn
155
+ value: 0.043888879112993606
156
+ - filter: mlp
157
+ value: 0.37533863309727755
158
+ - value: 0.32692015564467836
159
+ - sources:
160
+ - layer_range: [8, 12]
161
+ model: ./evolve_merges/input_models/merge-10162024_972739363
162
+ parameters:
163
+ density:
164
+ - filter: self_attn
165
+ value: 0.9340306321054911
166
+ - filter: mlp
167
+ value: 1.0
168
+ - value: 0.7968276665543247
169
+ weight:
170
+ - filter: self_attn
171
+ value: 0.14846986084920036
172
+ - filter: mlp
173
+ value: 0.3955452929300913
174
+ - value: 0.4270837195831495
175
+ - layer_range: [8, 12]
176
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
177
+ parameters:
178
+ density:
179
+ - filter: self_attn
180
+ value: 1.0
181
+ - filter: mlp
182
+ value: 1.0
183
+ - value: 1.0
184
+ weight:
185
+ - filter: self_attn
186
+ value: 0.3649415030710907
187
+ - filter: mlp
188
+ value: 0.16275044387393922
189
+ - value: 0.2758727640654811
190
+ - layer_range: [8, 12]
191
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
192
+ parameters:
193
+ density:
194
+ - filter: self_attn
195
+ value: 0.8295983370283204
196
+ - filter: mlp
197
+ value: 0.7788134370117827
198
+ - value: 0.9398894811483364
199
+ weight:
200
+ - filter: self_attn
201
+ value: 0.28746483121862637
202
+ - filter: mlp
203
+ value: 0.3358374043922244
204
+ - value: 0.2275533582239845
205
+ - layer_range: [8, 12]
206
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
207
+ parameters:
208
+ density:
209
+ - filter: self_attn
210
+ value: 1.0
211
+ - filter: mlp
212
+ value: 1.0
213
+ - value: 0.727821766634972
214
+ weight:
215
+ - filter: self_attn
216
+ value: 0.3081244623443608
217
+ - filter: mlp
218
+ value: 0.45014674558784984
219
+ - value: 0.11047219740073362
220
+ - sources:
221
+ - layer_range: [12, 16]
222
+ model: ./evolve_merges/input_models/merge-10162024_972739363
223
+ parameters:
224
+ density:
225
+ - filter: self_attn
226
+ value: 0.6489316039694529
227
+ - filter: mlp
228
+ value: 1.0
229
+ - value: 0.8272372022626591
230
+ weight:
231
+ - filter: self_attn
232
+ value: 0.470708064142626
233
+ - filter: mlp
234
+ value: -0.047129110924588186
235
+ - value: 0.42971949234723295
236
+ - layer_range: [12, 16]
237
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
238
+ parameters:
239
+ density:
240
+ - filter: self_attn
241
+ value: 1.0
242
+ - filter: mlp
243
+ value: 0.6616234442454084
244
+ - value: 1.0
245
+ weight:
246
+ - filter: self_attn
247
+ value: 0.26282202905677127
248
+ - filter: mlp
249
+ value: 0.4448525732857457
250
+ - value: 0.2229765978922556
251
+ - layer_range: [12, 16]
252
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
253
+ parameters:
254
+ density:
255
+ - filter: self_attn
256
+ value: 1.0
257
+ - filter: mlp
258
+ value: 0.6135513085208061
259
+ - value: 0.9581737790930396
260
+ weight:
261
+ - filter: self_attn
262
+ value: 0.24444794214178578
263
+ - filter: mlp
264
+ value: 0.07937992720612315
265
+ - value: -0.05228450555064985
266
+ - layer_range: [12, 16]
267
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
268
+ parameters:
269
+ density:
270
+ - filter: self_attn
271
+ value: 1.0
272
+ - filter: mlp
273
+ value: 1.0
274
+ - value: 1.0
275
+ weight:
276
+ - filter: self_attn
277
+ value: 0.1719406804216106
278
+ - filter: mlp
279
+ value: 0.0934880168140769
280
+ - value: 0.35045642161724166
281
+ - sources:
282
+ - layer_range: [16, 20]
283
+ model: ./evolve_merges/input_models/merge-10162024_972739363
284
+ parameters:
285
+ density:
286
+ - filter: self_attn
287
+ value: 0.5446785752563841
288
+ - filter: mlp
289
+ value: 0.8810586946591301
290
+ - value: 0.9152297583356134
291
+ weight:
292
+ - filter: self_attn
293
+ value: -0.0016341576761690624
294
+ - filter: mlp
295
+ value: -0.14493024949671152
296
+ - value: 0.26832439639581773
297
+ - layer_range: [16, 20]
298
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
299
+ parameters:
300
+ density:
301
+ - filter: self_attn
302
+ value: 1.0
303
+ - filter: mlp
304
+ value: 0.5944606032155147
305
+ - value: 0.9302142529770252
306
+ weight:
307
+ - filter: self_attn
308
+ value: 0.35950618403078893
309
+ - filter: mlp
310
+ value: 0.11051887834512175
311
+ - value: 0.42291230769302385
312
+ - layer_range: [16, 20]
313
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
314
+ parameters:
315
+ density:
316
+ - filter: self_attn
317
+ value: 1.0
318
+ - filter: mlp
319
+ value: 0.6546859569496538
320
+ - value: 0.8503723026949942
321
+ weight:
322
+ - filter: self_attn
323
+ value: 0.35331354069135923
324
+ - filter: mlp
325
+ value: 0.11666399796526544
326
+ - value: 0.027977616826786067
327
+ - layer_range: [16, 20]
328
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
329
+ parameters:
330
+ density:
331
+ - filter: self_attn
332
+ value: 0.8237153213010172
333
+ - filter: mlp
334
+ value: 0.7779880619326531
335
+ - value: 1.0
336
+ weight:
337
+ - filter: self_attn
338
+ value: 0.7145318763470817
339
+ - filter: mlp
340
+ value: 0.4104048815986916
341
+ - value: 0.07468194955613425
342
+ - sources:
343
+ - layer_range: [20, 24]
344
+ model: ./evolve_merges/input_models/merge-10162024_972739363
345
+ parameters:
346
+ density:
347
+ - filter: self_attn
348
+ value: 0.5231923060339636
349
+ - filter: mlp
350
+ value: 1.0
351
+ - value: 0.9856713754180749
352
+ weight:
353
+ - filter: self_attn
354
+ value: 0.4081014822719611
355
+ - filter: mlp
356
+ value: 0.09758488254406042
357
+ - value: 0.3348194266336727
358
+ - layer_range: [20, 24]
359
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
360
+ parameters:
361
+ density:
362
+ - filter: self_attn
363
+ value: 1.0
364
+ - filter: mlp
365
+ value: 1.0
366
+ - value: 1.0
367
+ weight:
368
+ - filter: self_attn
369
+ value: 0.7490383834336071
370
+ - filter: mlp
371
+ value: 0.4662047924812158
372
+ - value: -0.24858277913931304
373
+ - layer_range: [20, 24]
374
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
375
+ parameters:
376
+ density:
377
+ - filter: self_attn
378
+ value: 1.0
379
+ - filter: mlp
380
+ value: 1.0
381
+ - value: 0.8502797089454639
382
+ weight:
383
+ - filter: self_attn
384
+ value: 0.276884170342346
385
+ - filter: mlp
386
+ value: 0.633656940319029
387
+ - value: 0.5235799339573071
388
+ - layer_range: [20, 24]
389
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
390
+ parameters:
391
+ density:
392
+ - filter: self_attn
393
+ value: 1.0
394
+ - filter: mlp
395
+ value: 0.8562223977334964
396
+ - value: 0.9716150483673114
397
+ weight:
398
+ - filter: self_attn
399
+ value: 0.5270260765195226
400
+ - filter: mlp
401
+ value: 0.32711936701658684
402
+ - value: 0.05670152518434478
403
+ - sources:
404
+ - layer_range: [24, 28]
405
+ model: ./evolve_merges/input_models/merge-10162024_972739363
406
+ parameters:
407
+ density:
408
+ - filter: self_attn
409
+ value: 1.0
410
+ - filter: mlp
411
+ value: 1.0
412
+ - value: 0.8553635955278736
413
+ weight:
414
+ - filter: self_attn
415
+ value: 0.35406982791511876
416
+ - filter: mlp
417
+ value: -0.11643971781340703
418
+ - value: 0.20075532527415488
419
+ - layer_range: [24, 28]
420
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
421
+ parameters:
422
+ density:
423
+ - filter: self_attn
424
+ value: 1.0
425
+ - filter: mlp
426
+ value: 0.87297120460794
427
+ - value: 1.0
428
+ weight:
429
+ - filter: self_attn
430
+ value: 0.07480839031742999
431
+ - filter: mlp
432
+ value: 0.18311115096539785
433
+ - value: 0.3625508152553395
434
+ - layer_range: [24, 28]
435
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
436
+ parameters:
437
+ density:
438
+ - filter: self_attn
439
+ value: 1.0
440
+ - filter: mlp
441
+ value: 1.0
442
+ - value: 1.0
443
+ weight:
444
+ - filter: self_attn
445
+ value: 0.494667527482752
446
+ - filter: mlp
447
+ value: 0.3944202674139632
448
+ - value: -0.19227439649461792
449
+ - layer_range: [24, 28]
450
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
451
+ parameters:
452
+ density:
453
+ - filter: self_attn
454
+ value: 1.0
455
+ - filter: mlp
456
+ value: 1.0
457
+ - value: 1.0
458
+ weight:
459
+ - filter: self_attn
460
+ value: 0.06851638816347627
461
+ - filter: mlp
462
+ value: 0.431372227001768
463
+ - value: 0.1747985843980182
464
+ - sources:
465
+ - layer_range: [28, 32]
466
+ model: ./evolve_merges/input_models/merge-10162024_972739363
467
+ parameters:
468
+ density:
469
+ - filter: self_attn
470
+ value: 0.9094528371038374
471
+ - filter: mlp
472
+ value: 1.0
473
+ - value: 0.6090545725123906
474
+ weight:
475
+ - filter: self_attn
476
+ value: 0.25309591486694805
477
+ - filter: mlp
478
+ value: -0.263292487608102
479
+ - value: 0.1323202337738385
480
+ - layer_range: [28, 32]
481
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
482
+ parameters:
483
+ density:
484
+ - filter: self_attn
485
+ value: 0.6494843615875994
486
+ - filter: mlp
487
+ value: 1.0
488
+ - value: 0.7515064103597758
489
+ weight:
490
+ - filter: self_attn
491
+ value: 0.07729701084822604
492
+ - filter: mlp
493
+ value: 0.2170958326731126
494
+ - value: 0.22214702687265422
495
+ - layer_range: [28, 32]
496
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
497
+ parameters:
498
+ density:
499
+ - filter: self_attn
500
+ value: 0.8431056158343985
501
+ - filter: mlp
502
+ value: 0.8838909258744341
503
+ - value: 0.35295455870641634
504
+ weight:
505
+ - filter: self_attn
506
+ value: 0.6551015978225493
507
+ - filter: mlp
508
+ value: 0.016410780482769546
509
+ - value: 0.6370635339121399
510
+ - layer_range: [28, 32]
511
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
512
+ parameters:
513
+ density:
514
+ - filter: self_attn
515
+ value: 1.0
516
+ - filter: mlp
517
+ value: 1.0
518
+ - value: 1.0
519
+ weight:
520
+ - filter: self_attn
521
+ value: 0.04318024669287196
522
+ - filter: mlp
523
+ value: 0.7642269685567962
524
+ - value: 0.26850603466331324
525
+ - sources:
526
+ - layer_range: [32, 36]
527
+ model: ./evolve_merges/input_models/merge-10162024_972739363
528
+ parameters:
529
+ density:
530
+ - filter: self_attn
531
+ value: 1.0
532
+ - filter: mlp
533
+ value: 1.0
534
+ - value: 0.579520070097527
535
+ weight:
536
+ - filter: self_attn
537
+ value: -0.051737601944818495
538
+ - filter: mlp
539
+ value: 0.3503787657405606
540
+ - value: 0.08607827555366553
541
+ - layer_range: [32, 36]
542
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
543
+ parameters:
544
+ density:
545
+ - filter: self_attn
546
+ value: 1.0
547
+ - filter: mlp
548
+ value: 1.0
549
+ - value: 1.0
550
+ weight:
551
+ - filter: self_attn
552
+ value: 0.28766985337224327
553
+ - filter: mlp
554
+ value: 0.3046959778412749
555
+ - value: -0.0005520428411238121
556
+ - layer_range: [32, 36]
557
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
558
+ parameters:
559
+ density:
560
+ - filter: self_attn
561
+ value: 1.0
562
+ - filter: mlp
563
+ value: 0.915429997855087
564
+ - value: 1.0
565
+ weight:
566
+ - filter: self_attn
567
+ value: 0.440410051026902
568
+ - filter: mlp
569
+ value: -0.21574554516791783
570
+ - value: 0.15656972383477347
571
+ - layer_range: [32, 36]
572
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
573
+ parameters:
574
+ density:
575
+ - filter: self_attn
576
+ value: 1.0
577
+ - filter: mlp
578
+ value: 1.0
579
+ - value: 1.0
580
+ weight:
581
+ - filter: self_attn
582
+ value: 0.3263876152481672
583
+ - filter: mlp
584
+ value: -0.040618303294953154
585
+ - value: 0.47900376528192473
586
+ - sources:
587
+ - layer_range: [36, 40]
588
+ model: ./evolve_merges/input_models/merge-10162024_972739363
589
+ parameters:
590
+ density:
591
+ - filter: self_attn
592
+ value: 0.9171778237104341
593
+ - filter: mlp
594
+ value: 0.7229727777891508
595
+ - value: 0.9122033861491662
596
+ weight:
597
+ - filter: self_attn
598
+ value: 0.6154987734241069
599
+ - filter: mlp
600
+ value: 0.3910860949496661
601
+ - value: 0.5286422728941228
602
+ - layer_range: [36, 40]
603
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
604
+ parameters:
605
+ density:
606
+ - filter: self_attn
607
+ value: 0.6023409600465159
608
+ - filter: mlp
609
+ value: 1.0
610
+ - value: 1.0
611
+ weight:
612
+ - filter: self_attn
613
+ value: 0.39644253937030505
614
+ - filter: mlp
615
+ value: 0.7570672338863116
616
+ - value: 0.10261227723433294
617
+ - layer_range: [36, 40]
618
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
619
+ parameters:
620
+ density:
621
+ - filter: self_attn
622
+ value: 1.0
623
+ - filter: mlp
624
+ value: 1.0
625
+ - value: 0.8342554461687561
626
+ weight:
627
+ - filter: self_attn
628
+ value: 0.4563403174251752
629
+ - filter: mlp
630
+ value: 0.313992481082509
631
+ - value: 0.022583139471508834
632
+ - layer_range: [36, 40]
633
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
634
+ parameters:
635
+ density:
636
+ - filter: self_attn
637
+ value: 1.0
638
+ - filter: mlp
639
+ value: 0.9211392650515542
640
+ - value: 1.0
641
+ weight:
642
+ - filter: self_attn
643
+ value: -0.17092104595693997
644
+ - filter: mlp
645
+ value: 0.13032109680489912
646
+ - value: -0.03480332269062497
647
+ ```
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./evolve_merges/input_models/merge-10162024_972739363",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 15,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 5120,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 14336,
14
+ "max_position_embeddings": 1024000,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 40,
18
+ "num_key_value_heads": 8,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_theta": 1000000.0,
21
+ "sliding_window": null,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.46.1",
25
+ "use_cache": true,
26
+ "vocab_size": 131072
27
+ }
mergekit_config.yml ADDED
@@ -0,0 +1,617 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ base_model: ./evolve_merges/input_models/merge-10162024_972739363
2
+ dtype: bfloat16
3
+ merge_method: dare_ties
4
+ parameters:
5
+ int8_mask: 1.0
6
+ normalize: 1.0
7
+ slices:
8
+ - sources:
9
+ - layer_range: [0, 4]
10
+ model: ./evolve_merges/input_models/merge-10162024_972739363
11
+ parameters:
12
+ density:
13
+ - filter: self_attn
14
+ value: 0.6617851833521375
15
+ - filter: mlp
16
+ value: 1.0
17
+ - value: 0.7758506135029611
18
+ weight:
19
+ - filter: self_attn
20
+ value: 0.06553850894305135
21
+ - filter: mlp
22
+ value: 0.32372893196093133
23
+ - value: 0.24761893893703177
24
+ - layer_range: [0, 4]
25
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
26
+ parameters:
27
+ density:
28
+ - filter: self_attn
29
+ value: 0.8619096186212604
30
+ - filter: mlp
31
+ value: 0.9632945037149085
32
+ - value: 1.0
33
+ weight:
34
+ - filter: self_attn
35
+ value: 0.5496368676404241
36
+ - filter: mlp
37
+ value: 0.2817627768141395
38
+ - value: 0.2831242003449033
39
+ - layer_range: [0, 4]
40
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
41
+ parameters:
42
+ density:
43
+ - filter: self_attn
44
+ value: 1.0
45
+ - filter: mlp
46
+ value: 1.0
47
+ - value: 0.9238831652008582
48
+ weight:
49
+ - filter: self_attn
50
+ value: 0.6983534009784523
51
+ - filter: mlp
52
+ value: 0.7786486269006042
53
+ - value: 0.3362711484417948
54
+ - layer_range: [0, 4]
55
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
56
+ parameters:
57
+ density:
58
+ - filter: self_attn
59
+ value: 1.0
60
+ - filter: mlp
61
+ value: 1.0
62
+ - value: 0.897712174766424
63
+ weight:
64
+ - filter: self_attn
65
+ value: 0.6494468053120542
66
+ - filter: mlp
67
+ value: 0.11769817501358182
68
+ - value: 0.23745407940550356
69
+ - sources:
70
+ - layer_range: [4, 8]
71
+ model: ./evolve_merges/input_models/merge-10162024_972739363
72
+ parameters:
73
+ density:
74
+ - filter: self_attn
75
+ value: 0.768056839478356
76
+ - filter: mlp
77
+ value: 0.7392675781352855
78
+ - value: 1.0
79
+ weight:
80
+ - filter: self_attn
81
+ value: 0.4137398667324908
82
+ - filter: mlp
83
+ value: 0.5364761127195374
84
+ - value: -0.06120952450996993
85
+ - layer_range: [4, 8]
86
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
87
+ parameters:
88
+ density:
89
+ - filter: self_attn
90
+ value: 0.9328263901133284
91
+ - filter: mlp
92
+ value: 1.0
93
+ - value: 1.0
94
+ weight:
95
+ - filter: self_attn
96
+ value: 0.512662918449004
97
+ - filter: mlp
98
+ value: 0.8133160093541117
99
+ - value: 0.09518477923218693
100
+ - layer_range: [4, 8]
101
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
102
+ parameters:
103
+ density:
104
+ - filter: self_attn
105
+ value: 1.0
106
+ - filter: mlp
107
+ value: 1.0
108
+ - value: 1.0
109
+ weight:
110
+ - filter: self_attn
111
+ value: 0.6534355737222919
112
+ - filter: mlp
113
+ value: -0.2733724467069448
114
+ - value: 0.35896371241039604
115
+ - layer_range: [4, 8]
116
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
117
+ parameters:
118
+ density:
119
+ - filter: self_attn
120
+ value: 1.0
121
+ - filter: mlp
122
+ value: 0.9645408518441749
123
+ - value: 0.9920721804462888
124
+ weight:
125
+ - filter: self_attn
126
+ value: 0.043888879112993606
127
+ - filter: mlp
128
+ value: 0.37533863309727755
129
+ - value: 0.32692015564467836
130
+ - sources:
131
+ - layer_range: [8, 12]
132
+ model: ./evolve_merges/input_models/merge-10162024_972739363
133
+ parameters:
134
+ density:
135
+ - filter: self_attn
136
+ value: 0.9340306321054911
137
+ - filter: mlp
138
+ value: 1.0
139
+ - value: 0.7968276665543247
140
+ weight:
141
+ - filter: self_attn
142
+ value: 0.14846986084920036
143
+ - filter: mlp
144
+ value: 0.3955452929300913
145
+ - value: 0.4270837195831495
146
+ - layer_range: [8, 12]
147
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
148
+ parameters:
149
+ density:
150
+ - filter: self_attn
151
+ value: 1.0
152
+ - filter: mlp
153
+ value: 1.0
154
+ - value: 1.0
155
+ weight:
156
+ - filter: self_attn
157
+ value: 0.3649415030710907
158
+ - filter: mlp
159
+ value: 0.16275044387393922
160
+ - value: 0.2758727640654811
161
+ - layer_range: [8, 12]
162
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
163
+ parameters:
164
+ density:
165
+ - filter: self_attn
166
+ value: 0.8295983370283204
167
+ - filter: mlp
168
+ value: 0.7788134370117827
169
+ - value: 0.9398894811483364
170
+ weight:
171
+ - filter: self_attn
172
+ value: 0.28746483121862637
173
+ - filter: mlp
174
+ value: 0.3358374043922244
175
+ - value: 0.2275533582239845
176
+ - layer_range: [8, 12]
177
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
178
+ parameters:
179
+ density:
180
+ - filter: self_attn
181
+ value: 1.0
182
+ - filter: mlp
183
+ value: 1.0
184
+ - value: 0.727821766634972
185
+ weight:
186
+ - filter: self_attn
187
+ value: 0.3081244623443608
188
+ - filter: mlp
189
+ value: 0.45014674558784984
190
+ - value: 0.11047219740073362
191
+ - sources:
192
+ - layer_range: [12, 16]
193
+ model: ./evolve_merges/input_models/merge-10162024_972739363
194
+ parameters:
195
+ density:
196
+ - filter: self_attn
197
+ value: 0.6489316039694529
198
+ - filter: mlp
199
+ value: 1.0
200
+ - value: 0.8272372022626591
201
+ weight:
202
+ - filter: self_attn
203
+ value: 0.470708064142626
204
+ - filter: mlp
205
+ value: -0.047129110924588186
206
+ - value: 0.42971949234723295
207
+ - layer_range: [12, 16]
208
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
209
+ parameters:
210
+ density:
211
+ - filter: self_attn
212
+ value: 1.0
213
+ - filter: mlp
214
+ value: 0.6616234442454084
215
+ - value: 1.0
216
+ weight:
217
+ - filter: self_attn
218
+ value: 0.26282202905677127
219
+ - filter: mlp
220
+ value: 0.4448525732857457
221
+ - value: 0.2229765978922556
222
+ - layer_range: [12, 16]
223
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
224
+ parameters:
225
+ density:
226
+ - filter: self_attn
227
+ value: 1.0
228
+ - filter: mlp
229
+ value: 0.6135513085208061
230
+ - value: 0.9581737790930396
231
+ weight:
232
+ - filter: self_attn
233
+ value: 0.24444794214178578
234
+ - filter: mlp
235
+ value: 0.07937992720612315
236
+ - value: -0.05228450555064985
237
+ - layer_range: [12, 16]
238
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
239
+ parameters:
240
+ density:
241
+ - filter: self_attn
242
+ value: 1.0
243
+ - filter: mlp
244
+ value: 1.0
245
+ - value: 1.0
246
+ weight:
247
+ - filter: self_attn
248
+ value: 0.1719406804216106
249
+ - filter: mlp
250
+ value: 0.0934880168140769
251
+ - value: 0.35045642161724166
252
+ - sources:
253
+ - layer_range: [16, 20]
254
+ model: ./evolve_merges/input_models/merge-10162024_972739363
255
+ parameters:
256
+ density:
257
+ - filter: self_attn
258
+ value: 0.5446785752563841
259
+ - filter: mlp
260
+ value: 0.8810586946591301
261
+ - value: 0.9152297583356134
262
+ weight:
263
+ - filter: self_attn
264
+ value: -0.0016341576761690624
265
+ - filter: mlp
266
+ value: -0.14493024949671152
267
+ - value: 0.26832439639581773
268
+ - layer_range: [16, 20]
269
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
270
+ parameters:
271
+ density:
272
+ - filter: self_attn
273
+ value: 1.0
274
+ - filter: mlp
275
+ value: 0.5944606032155147
276
+ - value: 0.9302142529770252
277
+ weight:
278
+ - filter: self_attn
279
+ value: 0.35950618403078893
280
+ - filter: mlp
281
+ value: 0.11051887834512175
282
+ - value: 0.42291230769302385
283
+ - layer_range: [16, 20]
284
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
285
+ parameters:
286
+ density:
287
+ - filter: self_attn
288
+ value: 1.0
289
+ - filter: mlp
290
+ value: 0.6546859569496538
291
+ - value: 0.8503723026949942
292
+ weight:
293
+ - filter: self_attn
294
+ value: 0.35331354069135923
295
+ - filter: mlp
296
+ value: 0.11666399796526544
297
+ - value: 0.027977616826786067
298
+ - layer_range: [16, 20]
299
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
300
+ parameters:
301
+ density:
302
+ - filter: self_attn
303
+ value: 0.8237153213010172
304
+ - filter: mlp
305
+ value: 0.7779880619326531
306
+ - value: 1.0
307
+ weight:
308
+ - filter: self_attn
309
+ value: 0.7145318763470817
310
+ - filter: mlp
311
+ value: 0.4104048815986916
312
+ - value: 0.07468194955613425
313
+ - sources:
314
+ - layer_range: [20, 24]
315
+ model: ./evolve_merges/input_models/merge-10162024_972739363
316
+ parameters:
317
+ density:
318
+ - filter: self_attn
319
+ value: 0.5231923060339636
320
+ - filter: mlp
321
+ value: 1.0
322
+ - value: 0.9856713754180749
323
+ weight:
324
+ - filter: self_attn
325
+ value: 0.4081014822719611
326
+ - filter: mlp
327
+ value: 0.09758488254406042
328
+ - value: 0.3348194266336727
329
+ - layer_range: [20, 24]
330
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
331
+ parameters:
332
+ density:
333
+ - filter: self_attn
334
+ value: 1.0
335
+ - filter: mlp
336
+ value: 1.0
337
+ - value: 1.0
338
+ weight:
339
+ - filter: self_attn
340
+ value: 0.7490383834336071
341
+ - filter: mlp
342
+ value: 0.4662047924812158
343
+ - value: -0.24858277913931304
344
+ - layer_range: [20, 24]
345
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
346
+ parameters:
347
+ density:
348
+ - filter: self_attn
349
+ value: 1.0
350
+ - filter: mlp
351
+ value: 1.0
352
+ - value: 0.8502797089454639
353
+ weight:
354
+ - filter: self_attn
355
+ value: 0.276884170342346
356
+ - filter: mlp
357
+ value: 0.633656940319029
358
+ - value: 0.5235799339573071
359
+ - layer_range: [20, 24]
360
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
361
+ parameters:
362
+ density:
363
+ - filter: self_attn
364
+ value: 1.0
365
+ - filter: mlp
366
+ value: 0.8562223977334964
367
+ - value: 0.9716150483673114
368
+ weight:
369
+ - filter: self_attn
370
+ value: 0.5270260765195226
371
+ - filter: mlp
372
+ value: 0.32711936701658684
373
+ - value: 0.05670152518434478
374
+ - sources:
375
+ - layer_range: [24, 28]
376
+ model: ./evolve_merges/input_models/merge-10162024_972739363
377
+ parameters:
378
+ density:
379
+ - filter: self_attn
380
+ value: 1.0
381
+ - filter: mlp
382
+ value: 1.0
383
+ - value: 0.8553635955278736
384
+ weight:
385
+ - filter: self_attn
386
+ value: 0.35406982791511876
387
+ - filter: mlp
388
+ value: -0.11643971781340703
389
+ - value: 0.20075532527415488
390
+ - layer_range: [24, 28]
391
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
392
+ parameters:
393
+ density:
394
+ - filter: self_attn
395
+ value: 1.0
396
+ - filter: mlp
397
+ value: 0.87297120460794
398
+ - value: 1.0
399
+ weight:
400
+ - filter: self_attn
401
+ value: 0.07480839031742999
402
+ - filter: mlp
403
+ value: 0.18311115096539785
404
+ - value: 0.3625508152553395
405
+ - layer_range: [24, 28]
406
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
407
+ parameters:
408
+ density:
409
+ - filter: self_attn
410
+ value: 1.0
411
+ - filter: mlp
412
+ value: 1.0
413
+ - value: 1.0
414
+ weight:
415
+ - filter: self_attn
416
+ value: 0.494667527482752
417
+ - filter: mlp
418
+ value: 0.3944202674139632
419
+ - value: -0.19227439649461792
420
+ - layer_range: [24, 28]
421
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
422
+ parameters:
423
+ density:
424
+ - filter: self_attn
425
+ value: 1.0
426
+ - filter: mlp
427
+ value: 1.0
428
+ - value: 1.0
429
+ weight:
430
+ - filter: self_attn
431
+ value: 0.06851638816347627
432
+ - filter: mlp
433
+ value: 0.431372227001768
434
+ - value: 0.1747985843980182
435
+ - sources:
436
+ - layer_range: [28, 32]
437
+ model: ./evolve_merges/input_models/merge-10162024_972739363
438
+ parameters:
439
+ density:
440
+ - filter: self_attn
441
+ value: 0.9094528371038374
442
+ - filter: mlp
443
+ value: 1.0
444
+ - value: 0.6090545725123906
445
+ weight:
446
+ - filter: self_attn
447
+ value: 0.25309591486694805
448
+ - filter: mlp
449
+ value: -0.263292487608102
450
+ - value: 0.1323202337738385
451
+ - layer_range: [28, 32]
452
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
453
+ parameters:
454
+ density:
455
+ - filter: self_attn
456
+ value: 0.6494843615875994
457
+ - filter: mlp
458
+ value: 1.0
459
+ - value: 0.7515064103597758
460
+ weight:
461
+ - filter: self_attn
462
+ value: 0.07729701084822604
463
+ - filter: mlp
464
+ value: 0.2170958326731126
465
+ - value: 0.22214702687265422
466
+ - layer_range: [28, 32]
467
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
468
+ parameters:
469
+ density:
470
+ - filter: self_attn
471
+ value: 0.8431056158343985
472
+ - filter: mlp
473
+ value: 0.8838909258744341
474
+ - value: 0.35295455870641634
475
+ weight:
476
+ - filter: self_attn
477
+ value: 0.6551015978225493
478
+ - filter: mlp
479
+ value: 0.016410780482769546
480
+ - value: 0.6370635339121399
481
+ - layer_range: [28, 32]
482
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
483
+ parameters:
484
+ density:
485
+ - filter: self_attn
486
+ value: 1.0
487
+ - filter: mlp
488
+ value: 1.0
489
+ - value: 1.0
490
+ weight:
491
+ - filter: self_attn
492
+ value: 0.04318024669287196
493
+ - filter: mlp
494
+ value: 0.7642269685567962
495
+ - value: 0.26850603466331324
496
+ - sources:
497
+ - layer_range: [32, 36]
498
+ model: ./evolve_merges/input_models/merge-10162024_972739363
499
+ parameters:
500
+ density:
501
+ - filter: self_attn
502
+ value: 1.0
503
+ - filter: mlp
504
+ value: 1.0
505
+ - value: 0.579520070097527
506
+ weight:
507
+ - filter: self_attn
508
+ value: -0.051737601944818495
509
+ - filter: mlp
510
+ value: 0.3503787657405606
511
+ - value: 0.08607827555366553
512
+ - layer_range: [32, 36]
513
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
514
+ parameters:
515
+ density:
516
+ - filter: self_attn
517
+ value: 1.0
518
+ - filter: mlp
519
+ value: 1.0
520
+ - value: 1.0
521
+ weight:
522
+ - filter: self_attn
523
+ value: 0.28766985337224327
524
+ - filter: mlp
525
+ value: 0.3046959778412749
526
+ - value: -0.0005520428411238121
527
+ - layer_range: [32, 36]
528
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
529
+ parameters:
530
+ density:
531
+ - filter: self_attn
532
+ value: 1.0
533
+ - filter: mlp
534
+ value: 0.915429997855087
535
+ - value: 1.0
536
+ weight:
537
+ - filter: self_attn
538
+ value: 0.440410051026902
539
+ - filter: mlp
540
+ value: -0.21574554516791783
541
+ - value: 0.15656972383477347
542
+ - layer_range: [32, 36]
543
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
544
+ parameters:
545
+ density:
546
+ - filter: self_attn
547
+ value: 1.0
548
+ - filter: mlp
549
+ value: 1.0
550
+ - value: 1.0
551
+ weight:
552
+ - filter: self_attn
553
+ value: 0.3263876152481672
554
+ - filter: mlp
555
+ value: -0.040618303294953154
556
+ - value: 0.47900376528192473
557
+ - sources:
558
+ - layer_range: [36, 40]
559
+ model: ./evolve_merges/input_models/merge-10162024_972739363
560
+ parameters:
561
+ density:
562
+ - filter: self_attn
563
+ value: 0.9171778237104341
564
+ - filter: mlp
565
+ value: 0.7229727777891508
566
+ - value: 0.9122033861491662
567
+ weight:
568
+ - filter: self_attn
569
+ value: 0.6154987734241069
570
+ - filter: mlp
571
+ value: 0.3910860949496661
572
+ - value: 0.5286422728941228
573
+ - layer_range: [36, 40]
574
+ model: ./evolve_merges/input_models/Magnum-Picaro-0.7-v2-12b_3809452655
575
+ parameters:
576
+ density:
577
+ - filter: self_attn
578
+ value: 0.6023409600465159
579
+ - filter: mlp
580
+ value: 1.0
581
+ - value: 1.0
582
+ weight:
583
+ - filter: self_attn
584
+ value: 0.39644253937030505
585
+ - filter: mlp
586
+ value: 0.7570672338863116
587
+ - value: 0.10261227723433294
588
+ - layer_range: [36, 40]
589
+ model: ./evolve_merges/input_models/Chronos-Gold-12B-1.0_1861025797
590
+ parameters:
591
+ density:
592
+ - filter: self_attn
593
+ value: 1.0
594
+ - filter: mlp
595
+ value: 1.0
596
+ - value: 0.8342554461687561
597
+ weight:
598
+ - filter: self_attn
599
+ value: 0.4563403174251752
600
+ - filter: mlp
601
+ value: 0.313992481082509
602
+ - value: 0.022583139471508834
603
+ - layer_range: [36, 40]
604
+ model: ./evolve_merges/input_models/MN-12B-Mag-Mell-R1_399051020
605
+ parameters:
606
+ density:
607
+ - filter: self_attn
608
+ value: 1.0
609
+ - filter: mlp
610
+ value: 0.9211392650515542
611
+ - value: 1.0
612
+ weight:
613
+ - filter: self_attn
614
+ value: -0.17092104595693997
615
+ - filter: mlp
616
+ value: 0.13032109680489912
617
+ - value: -0.03480332269062497
model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cf9bf80e5095e75914bbd342655a6be6111b2fe1e736df43fe21e7953729e63
3
+ size 4865489336
model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb058813f69f8e625dfb7f32323f9bc2c483b1c2ca2c7b3ebd2ba7440df93cd4
3
+ size 4907529456
model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43548cb74af48241d03866bcf88646dfb022d0b6bd765dcdf07cbee7b254ae29
3
+ size 4907529464
model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3856567a958c4c50f22fc9049f53e659df6f1d8a00bc9581ddfc4901fa845b23
3
+ size 4907529456
model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fbd781d2b01ecc337e897b6e6a8e4424cda6c1be58db70844361e81bc325931
3
+ size 4907529392
model.safetensors.index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata": {"mergekit_version": "0.0.5.1", "total_size": 24495564800}, "weight_map": {"lm_head.weight": "model-00001-of-00005.safetensors", "model.embed_tokens.weight": "model-00001-of-00005.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.10.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.11.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00001-of-00005.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00005.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00001-of-00005.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00005.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00005.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00005.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00005.safetensors", "model.layers.12.input_layernorm.weight": "model-00001-of-00005.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.17.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.18.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.19.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.2.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00002-of-00005.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00002-of-00005.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00002-of-00005.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00002-of-00005.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00002-of-00005.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00002-of-00005.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00002-of-00005.safetensors", "model.layers.20.input_layernorm.weight": "model-00002-of-00005.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.26.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.27.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.28.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.28.mlp.down_proj.weight": "model-00003-of-00005.safetensors", "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00005.safetensors", "model.layers.28.mlp.up_proj.weight": "model-00003-of-00005.safetensors", "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00005.safetensors", "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00005.safetensors", "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00005.safetensors", "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00005.safetensors", "model.layers.29.input_layernorm.weight": "model-00003-of-00005.safetensors", "model.layers.29.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.29.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.29.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.3.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.30.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.30.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.30.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.31.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.31.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.31.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.32.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.32.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.32.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.33.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.33.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.33.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.34.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.34.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.34.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.35.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.35.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.35.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.35.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.35.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.35.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.35.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.35.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.35.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.36.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.36.mlp.down_proj.weight": "model-00004-of-00005.safetensors", "model.layers.36.mlp.gate_proj.weight": "model-00004-of-00005.safetensors", "model.layers.36.mlp.up_proj.weight": "model-00004-of-00005.safetensors", "model.layers.36.post_attention_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.36.self_attn.k_proj.weight": "model-00004-of-00005.safetensors", "model.layers.36.self_attn.o_proj.weight": "model-00004-of-00005.safetensors", "model.layers.36.self_attn.q_proj.weight": "model-00004-of-00005.safetensors", "model.layers.36.self_attn.v_proj.weight": "model-00004-of-00005.safetensors", "model.layers.37.input_layernorm.weight": "model-00004-of-00005.safetensors", "model.layers.37.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.37.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.37.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.38.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.38.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.38.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.39.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.39.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.39.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.input_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00005-of-00005.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00005-of-00005.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00005-of-00005.safetensors", "model.norm.weight": "model-00005-of-00005.safetensors"}}
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|im_end|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<pad>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b4c8fcd33487a449c07f423d47adb035bba8347ccf13eb074b4d1fef8acf919
3
+ size 17078288
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff