File size: 2,816 Bytes

3a13973

---
base_model:
- DavidAU/MN-Dark-Planet-TITAN-12B
- DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS
- IlyaGusev/saiga_nemo_12b
- DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-MADNESS
- win10/Mistral-Nemo-abliterated-Nemo-Pro-v2
library_name: transformers
tags:
- mergekit
- merge

---
# merge

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Merge Details
### Merge Method

This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [win10/Mistral-Nemo-abliterated-Nemo-Pro-v2](https://huggingface.co./win10/Mistral-Nemo-abliterated-Nemo-Pro-v2) as a base.

### Models Merged

The following models were included in the merge:
* [DavidAU/MN-Dark-Planet-TITAN-12B](https://huggingface.co./DavidAU/MN-Dark-Planet-TITAN-12B)
* [DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS](https://huggingface.co./DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS)
* [IlyaGusev/saiga_nemo_12b](https://huggingface.co./IlyaGusev/saiga_nemo_12b)
* [DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-MADNESS](https://huggingface.co./DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-MADNESS)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
models:      
  - model: DavidAU/MN-Dark-Planet-TITAN-12B
    parameters:
      density: 1
      weight: 1

  - model: DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-MADNESS
    parameters:
      density: 1
      weight: 1

  - model: DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS
    parameters:
      density: 1
      weight: 1

  - model: IlyaGusev/saiga_nemo_12b
    parameters:
      density: 1
      weight: 1    

merge_method: ties
base_model: win10/Mistral-Nemo-abliterated-Nemo-Pro-v2
  
dtype: float32
chat_template: chatml
# Regularization
regularization:
  - method: gradient_penalty
    scale: 0.05  # Increased influence for gradient control
  - method: weight_clipping
    clip_range: [-0.2, 0.2]  # Broader clipping range for flexibility
  - method: random_noise
    scale: 0.01  # Stronger noise injection
  - method: attention_dropout
    scale: 0.1  # Higher dropout to reduce attention fixation

# Postprocessing
postprocessing:
  - operation: entropy_regularization
    scale: 0.05  # Stronger encouragement for diverse outputs
  - operation: non_linear_scaling
    parameters:
      function: tanh
  - operation: sharpening
    intensity: 0.5  # Enhanced sharpening for precise outputs
  - operation: gaussian_smoothing
    sigma: 1.5  # Increased smoothing for stable outputs
  - operation: normalize
  - operation: dynamic_scaling
    scale_range: [0.8, 1.2]  # Expanded dynamic range for scaling
  - operation: smoothing
    parameters:
      adaptive: true
      range: [0.85, 1.15]  # Wider adaptive smoothing range
      kernel_size: 5

```