ssmits
/

Qwen2.5-95B-Instruct

Text Generation

Model card Files Files and versions Community

ssmits commited on Sep 25, 2024

Commit

9c0e7df

·

verified ·

1 Parent(s): 52add26

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ tags:
 Qwen2.5-95B-Instruct is a [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
-The layer ranges chosen for this merge were inspired by a rough estimate of the layer similarity analysis of [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual). Layer similarity analysis involves examining the outputs of different layers in a neural network to determine how similar or different they are. This technique can help identify which layers contribute most significantly to the model's performance. In the context of the Falcon-11B model, layer similarity analysis across multiple languages revealed that certain layers were more important for maintaining performance. Additionally, this analysis can be used to more rigidly structure the LLM for optimal Next Token Prediction, allowing for a more efficient and effective language model architecture.
 - [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b)
 - [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b)

 Qwen2.5-95B-Instruct is a [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
+The layer ranges chosen for this merge were inspired by a rough estimate of the layer similarity analysis of [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual). Layer similarity analysis involves examining the outputs of different layers in a neural network to determine how similar or different they are. This technique can help identify which layers contribute most significantly to the model's performance. In the context of the Falcon-11B model, layer similarity analysis across multiple languages revealed that the first half of the layers were more important for maintaining performance. Additionally, this analysis can be used to more rigidly slice and add extra layers for optimal Next Token Prediction, allowing for possibly a model architecture that's more creative and powerful.
 - [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b)
 - [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b)