Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ tags:
|
|
15 |
|
16 |
Qwen2.5-95B-Instruct is a [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
|
17 |
|
18 |
-
The layer ranges chosen for this merge were inspired by a rough estimate of the layer similarity analysis of [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual). Layer similarity analysis involves examining the outputs of different layers in a neural network to determine how similar or different they are. This technique can help identify which layers contribute most significantly to the model's performance. In the context of the Falcon-11B model, layer similarity analysis across multiple languages revealed that
|
19 |
|
20 |
- [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b)
|
21 |
- [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b)
|
|
|
15 |
|
16 |
Qwen2.5-95B-Instruct is a [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
|
17 |
|
18 |
+
The layer ranges chosen for this merge were inspired by a rough estimate of the layer similarity analysis of [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual). Layer similarity analysis involves examining the outputs of different layers in a neural network to determine how similar or different they are. This technique can help identify which layers contribute most significantly to the model's performance. In the context of the Falcon-11B model, layer similarity analysis across multiple languages revealed that the first half of the layers were more important for maintaining performance. Additionally, this analysis can be used to more rigidly slice and add extra layers for optimal Next Token Prediction, allowing for possibly a model architecture that's more creative and powerful.
|
19 |
|
20 |
- [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b)
|
21 |
- [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b)
|