Merge recipe to produce the intermediate models?
Do you mind sharing the merge recipe you used to merge Athene and turbocat with Llama 3.1, or did you use the same SLERP template for all three merges? Did you experiment with any other approaches?
I'm downloading a GGUF of your model now to test it out. Thanks for sharing the result and the method you used.
The template was the same for every merge, just changed the file names. I did try other approaches, but this one was the one that came out the best.
Thanks for sharing!
I had some success with this template. I used it to make a merge of Llama 3.1 with my New-Dawn model that I think came out reasonably well. The goal was to retain Llama 3.1's longer context capabilities, and that seems to have worked. I'm going to upload it soon.
merge_method: della_linear
base_model: /home/llm/mergequant/models/BASE/Meta-Llama-3.1-70B-Instruct
models:
- model: /home/llm/mergequant/models/new-dawn-llama3-70b-32K-v1.0
parameters:
weight:
- filter: v_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: o_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: up_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: gate_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: down_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- value: 0
density: 0.25
epsilon: 0.05
lambda: 1.0
- model: /home/llm/mergequant/models/BASE/Meta-Llama-3.1-70B-Instruct
parameters:
weight: 1.0
density:
- filter: v_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: o_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: up_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: gate_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- filter: down_proj
value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
- value: 0.5
epsilon:
- filter: v_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: o_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: up_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: gate_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- filter: down_proj
value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
- value: 0.1
lambda: 1.0
dtype: float16
tokenizer_source: base
I also produced a coherent SLERP merge that retained the long context capabilities using this two-step recipe, although it didn't perform as well in my subjective testing. You could copy the second step of the merge if you wanted to produce a long-context version of your model.
name: _newdawn_pre_merge
models:
- model: /home/llm/mergequant/models/new-dawn-llama3-70b-32K-v1.0
- model: /home/llm/mergequant/models/BASE/Meta-Llama-3.1-70B-Instruct
merge_method: slerp
base_model: /home/llm/mergequant/models/BASE/Meta-Llama-3.1-70B-Instruct
parameters:
t:
- value: 0.5
dtype: float16
---
# See https://huggingface.co./jukofyork/Dark-Miqu-70B/discussions/3
# Credit for merge recipe belongs to jukofyork
name: new-dawn-llama3.1-70b-v1.1
merge_method: linear
models:
- model: /home/llm/mergequant/models/BASE/Meta-Llama-3.1-70B-Instruct
parameters:
weight:
- filter: v_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: o_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: up_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: gate_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- filter: down_proj
value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
- value: 1
- model: _newdawn_pre_merge
parameters:
weight:
- filter: v_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: o_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: up_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: gate_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- filter: down_proj
value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
- value: 0
base_model: /home/llm/mergequant/models/BASE/Meta-Llama-3.1-70B-Instruct
tokenizer_source: base
dtype: float16
I hope something in here proves to be interesting or helpful in your own experiments.