EmbeddedLLM
/

Mistral-7B-Merge-02-v0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Mistral-7B-Merge-02-v0 / README.md

thesunday's picture

Add base models to model card metadata

d5f004c verified about 1 year ago

|

1.57 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- merge
	base_model:
	- teknium/OpenHermes-2.5-Mistral-7B
	- Intel/neural-chat-7b-v3-3
	---

	# Model Description
	This is an experiment to compare merging 2 models using DARE TIES versus SLERP 🦙

	We are mainly interested to compare against [Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp](https://huggingface.co./Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp)

	The 2 models involved in the merge as follows:
	1. [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co./teknium/OpenHermes-2.5-Mistral-7B)
	2. [Intel/neural-chat-7b-v3-3](https://huggingface.co./Intel/neural-chat-7b-v3-3)

	- base model: [mistralai/Mistral-7B-v0.1](https://huggingface.co./mistralai/Mistral-7B-v0.1)

	The yaml config file for the merge is:

	```yaml
	models:
	- model: mistralai/Mistral-7B-v0.1
	# no parameters necessary for base model
	- model: teknium/OpenHermes-2.5-Mistral-7B
	parameters:
	weight: 0.5
	density: 0.5
	- model: Intel/neural-chat-7b-v3-3
	parameters:
	weight: 0.5
	density: 0.5
	merge_method: dare_ties
	base_model: mistralai/Mistral-7B-v0.1
	parameters:
	int8_mask: true
	dtype: bfloat16
	```

	# Open LLM Leaderboard

	Note that with more tuning DARE TIES might achieve better results.

	\| \| DARE TIES \| SLERP \|
	\|------------\|-----------\|-------\|
	\| Average \| 70.69 \| 71.38 \|
	\| ARC \| 67.49 \| 68.09 \|
	\| HellaSwag \| 85.78 \| 86.2 \|
	\| MMLU \| 64.1 \| 64.26 \|
	\| TruthfulQA \| 60.52 \| 62.78 \|
	\| Winogrande \| 79.01 \| 79.16 \|
	\| GSM8K \| 67.25 \| 67.78 \|