mlabonne
/

BigQwen2.5-Echo-47B-Instruct

@@ -1,27 +1,45 @@
 ---
-base_model:
-- Qwen/Qwen2.5-32B-Instruct
 library_name: transformers
 tags:
 - mergekit
 - merge
 ---
-# merge2
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the passthrough merge method.
-### Models Merged
-The following models were included in the merge:
-* [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)
-### Configuration
 The following YAML configuration was used to produce this model:
@@ -178,3 +196,28 @@ slices:
 merge_method: passthrough
 dtype: bfloat16
 ```

 ---
+license: other
+license_name: tongyi-qianwen
+license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
+language:
+- en
+pipeline_tag: text-generation
 library_name: transformers
 tags:
 - mergekit
 - merge
+- lazymergekit
+base_model:
+- Qwen/Qwen2.5-32B-Instruct
 ---
+# BigQwen2.5-Echo-47B-Instruct
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/98GiKtmH1AtHHbIbOUH4Y.jpeg)
+BigQwen2.5-Echo-47B-Instruct is a [Qwen/Qwen2-32B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
+## 🔉 Echo Merge
+I've tried a more gradual approach with a **distributed repetition pattern**. Instead of replicating blocks of 8 or more layers, I'm replicating individual layers in these blocks:
+- First 8 layers: No replication
+- Next 8 layers: Replicate 2 layers (first one, middle one)
+- Next 8 layers: Replicate 4 layers (1st, 3rd, 5th, 7th)
+- Next 8 layers: Replicate 8 layers (all of them)
+- Next 8 layers: Replicate 4 layers (1st, 3rd, 5th, 7th)
+- Next 8 layers: Replicate 2 layers (first one, middle one)
+- First 8 layers: No replication
+I used this string to visualize it, where 0 are original layers and 1 duplicated ones (the order doesn't matter):
+```
+00000000 1000010000 100100100100 1010101010101010 1010101010101010 100100100100 1000010000 00000000
+```
+The main idea is that the input/output difference of middle layers is quite small, so replicating a middle layer has a small impact on the output.
+The additional layers are designed to increase the model's capacity without breaking the information flow, which often creates "insane" self-merges.
+## 🧩 Configuration
 The following YAML configuration was used to produce this model:
 merge_method: passthrough
 dtype: bfloat16
 ```
+## 💻 Usage
+```python
+!pip install -qU transformers accelerate
+from transformers import AutoTokenizer
+import transformers
+import torch
+model = "mlabonne/BigQwen2.5-Echo-47B-Instruct"
+messages = [{"role": "user", "content": "What is a large language model?"}]
+tokenizer = AutoTokenizer.from_pretrained(model)
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
+print(outputs[0]["generated_text"])
+```