File size: 3,897 Bytes
f988ad8
 
 
 
ddfc772
 
 
6f1cae3
 
 
4d7f206
 
 
32664b3
 
1024e6e
32664b3
1024e6e
32664b3
dea1031
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
998556c
32664b3
ddfc772
998556c
1024e6e
32664b3
e534d26
32664b3
e534d26
02e2a64
e534d26
32664b3
e534d26
32664b3
e534d26
32664b3
e534d26
 
 
 
 
998556c
e534d26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: apache-2.0
language:
- en
pipeline_tag: text-classification
tags:
- code
base_model:
- arcee-ai/Arcee-Spark
- Replete-AI/Replete-LLM-Qwen2-7b
datasets:
- Replete-AI/code_bagel
- arcee-ai/The-Tome
---

This is an experimental coding-focused merge of the latest of two of my favorite projects which have trained and fine-tuned the Qwen2 model on open source data:

Replete-AI's Replete LLM Qwen2-7B (https://huggingface.co./Replete-AI/Replete-LLM-Qwen2-7b) Arcee-AI's Arcee Spark (https://huggingface.co./arcee-ai/Arcee-Spark)

```yaml
models:
  - model: arcee-ai/Arcee-Spark
    parameters:
      density: 0.3
      weight: 0.3
  - model: Replete-AI/Replete-LLM-Qwen2-7b
    parameters:
      density: 0.8
      weight: 0.7
merge_method: dare_ties
base_model: Qwen/Qwen2-7B
parameters:
  int8_mask: true
  rescale: true
  normalize: true
dtype: bfloat16
```

The GGUF is quantized to q8_0 for output and embedding tensors, q5_k_m for others.

The experimental nature and Replete LLM Qwen2-7B lack of censorship means that you take all responsibility for how you use this model.  If there is interest, I will publish the huggingface repo; otherwise, the dare_ties mergekit YAML that produced it is already included.  

I asked it to describe itself, given its own mergekit YAML and points from the web pages for its two ancestor models:

> This is the mergekit YAML that made you from an experimental DARE TIES merge between Arcee-Spark and Replete-LLM-Qwen2-7b.  This is an effort to combine the strengths of two Qwen2-based models which train and fine-tune the model on high-quality open source data. In one detailed paragraph, discuss the descriptions of Arcee-Spark and Replete-LLM-Qwen2-7b, and describe the likely behaviors of this merge, take into account the weight and density leaning towards the code-focused Replete-LLM-Qwen2-7b.  Then, supply a bullet list of the top five likely use cases of such a merge.  Then, write a sentence about the impact of a GGUF quantization which uses q8_0 for output and embedding tensors, and q5_k_m for the rest.

The Arcee-Spark model, initialized from Qwen2, is a powerful 7B parameter language model that has been fine-tuned on an extensive dataset of over 1.8 million samples through a sophisticated training process which includes merging with Qwen2-7B-Instruct using Arcee's mergekit and further refinement via Direct Preference Optimization (DPO). This meticulous approach results in exceptional performance, achieving the highest score on MT-Bench for models of its size and outperforming GPT-3.5 on many tasks.

Replete-LLM-Qwen2-7b is a versatile model fine-tuned to excel across various domains including science, social media, general knowledge, cooking, writing, medicine, history, law, role-play, news, coding, and math, among others. This model demonstrates adaptability and proficiency in handling diverse tasks.

The merge of Arcee-Spark and Replete-LLM-Qwen2-7b likely leverages the strengths of both models, with a notable leaning towards the code-focused aspect of Qwen2-7b due to its higher weight and density. The combined model might exhibit enhanced capabilities for coding-related tasks while retaining the versatility across other domains.

Top five likely use cases of this merge:

1. Code generation and analysis
2. Advanced text-based software development projects
3. Interactive code review and debugging assistance
4. Educational programming content creation
5. Real-time dynamic code completion suggestions

The GGUF quantization technique, which employs q8_0 for output and embedding tensors and q5_k_m for the rest, significantly reduces model size without compromising performance. This approach leads to more efficient storage and faster inference times, making it ideal for deployment on resource-constrained devices or edge computing scenarios while maintaining high-quality results across diverse tasks.