base_model:
- ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
- Sao10K/L3.1-8B-Niitama-v1.1
- Sao10K/L3-8B-Tamamo-v1
- Sao10K/L3-8B-Stheno-v3.3-32K
- Edgerunners/Lyraea-large-llama-3.1
library_name: transformers
tags:
- mergekit
- merge
My first foray into Llama 3.1 and just having fun with the merging process. Testing theories and such.
Updated version with higher context here.
Quants
OG Q8 GGUF by me.
Details & Recommended Settings
Unfortunaely, this model still double lines but its not as often. Dramatic as fuck at times. I haven't tested the context limit yet but I'm sure it suffered somehow.
Outputs a lot, pretty chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. A little cliche writing, but it's almost endearing in a way. Should follow instructs fine? Stunted a little compared to the original model, don't think that's a negative though.
4K Max context even on L3.1 (DAMN U FORMAX)
Rec. Settings:
Template: L3
Temperature: 1.35
Min P: 0.1
Repeat Penalty: 1.05
Repeat Penalty Tokens: 256
Models Merged & Merge Theory
The following models were included in the merge:
- Edgerunners/Lyraea-large-llama-3.1
- Sao10K/L3-8B-Stheno-v3.3-32K
- Sao10K/L3.1-8B-Niitama-v1.1
- Sao10K/L3-8B-Tamamo-v1
- ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
Using Edgerunners Lyraea as the 3.1 base, model stock mereged L3.1 Niitama, Stheno 3.3, and Tamamo a top each other. Then trying to curb L3 tendencies and add some instruct following capabilities, added some Formax in a dare linear merge. At least for updating L3 to L3.1, doing TIES anything results in a 'shittier' model.
Config
models:
- model: Sao10K/L3.1-8B-Niitama-v1.1
- model: Sao10K/L3-8B-Stheno-v3.3-32K
- model: Sao10K/L3-8B-Tamamo-v1
base_model: Edgerunners/Lyraea-large-llama-3.1
parameters:
normalize: false
int8_mask: true
merge_method: model_stock
dtype: float32
out_dtype: bfloat16
name: siitamol3.1
---
models:
- model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
parameters:
weight: [0.5, 0.3, 0.2, 0.1]
- model: siitamol3.1
parameters:
weight: [0.5, 0.7, 0.8, 1]
base_model: siitamol3.1
parameters:
normalize: false
int8_mask: true
merge_method: dare_linear
dtype: float32
out_dtype: bfloat16