merge

I haven't tried the untuned MS3 before messing around with the merge. But I don't think it's all that different from this thing. It's not like there's no influence from the tuned adapters at all, it's just less than I expected. That might be for the better, though. The result is usable as is.

Will use this as part of upcoming merges when there is enough fuel.

Merge Details

Step1

models:
  - model: unsloth/Mistral-Small-24B-Base-2501
  - model: unsloth/Mistral-Small-24B-Instruct-2501+ToastyPigeon/new-ms-rp-test-ws
    parameters:
        select_topk:
          - value: [0.05, 0.03, 0.02, 0.02, 0.01]
  - model: unsloth/Mistral-Small-24B-Instruct-2501+estrogen/MS2501-24b-Ink-ep2-adpt
    parameters:
        select_topk: 0.1
  - model: trashpanda-org/MS-24B-Instruct-Mullein-v0
    parameters:
        select_topk: 0.4
base_model: unsloth/Mistral-Small-24B-Base-2501
merge_method: sce
parameters:
  int8_mask: true
  rescale: true
  normalize: true
dtype: bfloat16
tokenizer_source: base

Step2

dtype: bfloat16
tokenizer_source: base
merge_method: della_linear
parameters:
  density: 0.55
base_model: Step1
models:
  - model: unsloth/Mistral-Small-24B-Instruct-2501
    parameters:
      weight:
        - filter: v_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: o_proj
          value: [1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1]
        - filter: up_proj
          value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
        - filter: gate_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: down_proj
          value: [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
        - value: 0
  - model: Step1
    parameters:
      weight:
        - filter: v_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: o_proj
          value: [0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0]
        - filter: up_proj
          value: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        - filter: gate_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: down_proj
          value: [0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1]
        - value: 1
Downloads last month
0
Safetensors
Model size
23.6B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Nohobby/MS3-test-Merge-1

Finetuned
(6)
this model
Quantizations
1 model