Adding Evaluation Results

26afa0c verified 8 months ago

6.63 kB

	---
	license: apache-2.0
	tags:
	- merge
	- mergekit
	- lazymergekit
	- fblgit/UNA-TheBeagle-7b-v1
	- udkai/Turdus
	model-index:
	- name: Marcoroni-7b-DPO-Merge
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 73.04
	name: normalized accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=nfaheem/Marcoroni-7b-DPO-Merge
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 88.8
	name: normalized accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=nfaheem/Marcoroni-7b-DPO-Merge
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.24
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=nfaheem/Marcoroni-7b-DPO-Merge
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 70.47
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=nfaheem/Marcoroni-7b-DPO-Merge
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 85.24
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=nfaheem/Marcoroni-7b-DPO-Merge
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 67.63
	name: accuracy
	source:
	url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=nfaheem/Marcoroni-7b-DPO-Merge
	name: Open LLM Leaderboard
	---

	# Marcoroni-7b-DPO-Merge

	Marcoroni-7b-DPO-Merge is a merge of the following models using [mergekit](https://github.com/cg123/mergekit) and inspired by [Maxime Labonne's work](https://medium.com/@mlabonne):
	* [fblgit/UNA-TheBeagle-7b-v1](https://huggingface.co./fblgit/UNA-TheBeagle-7b-v1)
	* [udkai/Turdus](https://huggingface.co./udkai/Turdus)

	## 🧩 Configuration

	```yaml
	models:
	- model: madatnlp/marcoroni-7b-v3-safetensor
	# no parameters necessary for base model
	- model: fblgit/UNA-TheBeagle-7b-v1
	parameters:
	density: 0.3
	weight: 0.5
	- model: udkai/Turdus
	parameters:
	density: 0.7
	weight: 0.3
	merge_method: ties
	base_model: madatnlp/marcoroni-7b-v3-safetensor
	parameters:
	normalize: true
	dtype: float16
	```


	## 💻 Example Python Code

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

	model_name_or_path = "nfaheem/Marcoroni-7b-DPO-Merge"
	model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
	device_map="auto",
	revision="main")

	tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

	prompt = "Write a story about llamas"
	system_message = "You are a story writing assistant"
	prompt_template=f'''{prompt}
	'''

	print("\n\n*** Generate:")

	input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
	output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
	print(tokenizer.decode(output[0]))

	# Inference can also be done using transformers' pipeline

	print("*** Pipeline:")
	pipe = pipeline(
	"text-generation",
	model=model,
	tokenizer=tokenizer,
	max_new_tokens=512,
	do_sample=True,
	temperature=0.7,
	top_p=0.95,
	top_k=40,
	repetition_penalty=1.1
	)

	print(pipe(prompt_template)[0]['generated_text'])
	```

	## 📋 Summary Eval:

	\| Average \| ARC \| HellaSwag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \|
	\|---------\|-------\|-----------\|--------\|------------\|------------\|-------\|
	\| 74.9 \| 73.04 \| 88.8 \| 64.24 \| 70.47 \| 85.24 \| 67.63 \|


	## 📈 Huggingface Leaderboard
	It's Ranked # 1 on HuggingFace Leaderboard among around 13B parameters (01/15/2024)

	\| Model \| Average \| ARC \| HellaSwag \| MMLU \| Truthful QA \| Winogrande \| GSM8K \|
	\| ---------------------------------- \| ------- \| ----- \| --------- \| ----- \| ----------- \| ------------\| ----- \|
	\| nfaheem/Marcoroni-7b-DPO-Merge \| 74.9 \| 73.04 \| 88.8 \| 64.24 \| 70.47 \| 85.24 \| 67.63 \|
	\| mlabonne/Beagle14-7b \| 74.76 \| 72.95 \| 87.95 \| 64.7 \| 68.38 \| 82.64 \| 71.42 \|
	\| udkai/Turdus \| 74.66 \| 73.38 \| 88.56 \| 64.52 \| 67.11 \| 86.66 \| 67.7 \|
	\| CultriX/MergeTrix-7B \| 74.33 \| 72.24 \| 87.84 \| 64.88 \| 66.27 \| 83.5 \| 71.19 \|
	\| fblgit/UNA-TheBeagle-7b-v1 \| 73.87 \| 73.04 \| 88 \| 63.48 \| 69.85 \| 82.16 \| 66.72 \|


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b6cc785fd617abdfec6bed/0PE-ffmkezG1S6CqScPAv.png)


	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_nfaheem__Marcoroni-7b-DPO-Merge)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|74.90\|
	\|AI2 Reasoning Challenge (25-Shot)\|73.04\|
	\|HellaSwag (10-Shot) \|88.80\|
	\|MMLU (5-Shot) \|64.24\|
	\|TruthfulQA (0-shot) \|70.47\|
	\|Winogrande (5-shot) \|85.24\|
	\|GSM8k (5-shot) \|67.63\|