merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using CultriX/Qwen2.5-14B-Wernickev3 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: dare_ties # Changed to dare_ties
base_model: CultriX/Qwen2.5-14B-Wernickev3
dtype: bfloat16 # Use float32 for maximum precision.
out_dtype: bfloat16  # Output model also uses bfloat16 for consistency and reduced memory usage.

parameters:
  t: 0.5  # Balances interpolation between models; 0.5 gives equal weight to all contributors.
  normalize: true  # Ensures parameters are normalized to maintain stability during merging.
  rescale: true  # Aligns parameter scales across models for better integration.
  int8_mask: false  # Disable int8 masking to preserve full precision during merging.
  epsilon: 0.008  # Ultra-fine parameter scaling for precise adjustments between models.
  lambda: 1.8  # Emphasizes high-impact parameters, giving more weight to significant contributors.

adaptive_merge_parameters:
  task_weights:  # Assign weights to tasks based on their priority and impact on benchmarks.
    tinyArc: 1.6  # Logical reasoning benchmark; slightly lower priority.
    tinyHellaswag: 1.5  # Contextual reasoning benchmark with moderate priority.
    tinyMMLU: 1.8  # Multi-domain knowledge benchmark; important for multitask performance.
    tinyTruthfulQA: 1.9  # Focuses on factual reasoning and QA; high priority.
    tinyTruthfulQA_mc1: 1.75  # Multiple-choice factual reasoning; closely related to TruthfulQA.
    tinyWinogrande: 1.75  # Core reasoning benchmark; slightly lower than BBH.
    IFEval: 2.30  # Instruction-following tasks; given a high priority for practical applications.
    BBH: 2.05  # Complex reasoning benchmark; critical for logical tasks.
    MATH: 2.70  # Highest priority to emphasize mathematical reasoning excellence.
    GPQA: 2.20  # Graduate-level QA tasks; balanced priority for high-level reasoning.
    MUSR: 2.15  # Multi-step reasoning; slightly increased to strengthen reasoning performance.
    MMLU-PRO: 2.00  # Domain multitask benchmark; maintained for general multitask capability.
  smoothing_factor: 0.03  # Low smoothing for precise task-specific blending without over-generalizing.

gradient_clipping:  # Control gradient clipping for each model to stabilize training.
  CultriX/Qwen2.5-14B-Wernickev3: 0.89  # Higher value ensures stability for the base model.
  djuna/Q2.5-Veltha-14B-0.5: 0.92  # Stable setting to enhance reasoning contributions.
  CultriX/SeQwence-14B-EvolMerge: 0.87  # Moderate value for generalist multitask support.
  qingy2024/Fusion4-14B-Instruct: 0.93  # High stability to emphasize mathematical tasks.
  CultriX/Qwen2.5-14B-Emerged: 0.88  # Stable setting to maintain multitask performance.
  sometimesanotion/Lamarck-14B-v0.6: 0.89  # Stable contribution for multi-step reasoning.
  allknowingroger/QwenSlerp6-14B: 0.90  # Adjusted for stable integration of the replacement model.
  hotmailuser/QwenSlerp2-14B: 0.91  # Increased slightly for stable integration of reasoning contributions.

models:  # Define models to include in the merge, along with their weights and densities.
  - model: CultriX/Qwen2.5-14B-Wernickev3
    parameters:
      weight: 0.33  # Increased to absorb some of the weight from the removed model.
      density: 0.78  # Maintained optimal density for robust generalist performance.

  - model: djuna/Q2.5-Veltha-14B-0.5
    parameters:
      weight: 0.28  # Increased slightly to enhance reasoning benchmarks like MUSR.
      density: 0.77  # Maintained for strong nuanced reasoning.

  - model: allknowingroger/QwenSlerp6-14B  # Replacement for Qwenfinity-2.5-14B.
    parameters:
      weight: 0.15  # Matches the weight of the replaced model to preserve balance.
      density: 0.70  # Increased slightly for stronger parameter integration.

  - model: CultriX/SeQwence-14B-EvolMerge
    parameters:
      weight: 0.12  # Moderate weight for general multitask support.
      density: 0.62  # Maintained for stable contribution.

  - model: qingy2024/Fusion4-14B-Instruct
    parameters:
      weight: 0.09  # Moderate weight; focuses on mathematical reasoning tasks.
      density: 0.75  # Maintained density for stable integration.

  - model: CultriX/Qwen2.5-14B-Emerged
    parameters:
      weight: 0.08  # Balanced weight for multitask contributions.
      density: 0.69  # Maintained density for stable integration.

  - model: sometimesanotion/Lamarck-14B-v0.6
    parameters:
      weight: 0.06  # Lower weight to allow more impactful models to dominate.
      density: 0.62  # Maintained for stable multi-step reasoning contribution.

  - model: hotmailuser/QwenSlerp2-14B
    parameters:
      weight: 0.11  # Increased slightly to balance contributions.
      density: 0.66  # Maintained for stable parameter integration.
Downloads last month
40
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CultriX/Qwen2.5-14B-Hyperionv4

Space using CultriX/Qwen2.5-14B-Hyperionv4 1