Update README.md

845d4c0 verified 6 months ago

21.9 kB

	---
	license: apache-2.0
	datasets:
	- teknium/OpenHermes-2.5
	tags:
	- axolotl
	- 01-ai/Yi-1.5-9B-Chat
	- finetune
	---


	# Hermes-2.5-Yi-1.5-9B-Chat

	This model is a fine-tuned version of [01-ai/Yi-1.5-9B-Chat](https://huggingface.co./01-ai/Yi-1.5-9B-Chat) on the [teknium/OpenHermes-2.5](https://huggingface.co./datasets/teknium/OpenHermes-2.5) dataset.
	I'm very happy with the results. The model now seems a lot smarter and "aware" in certain situations (first look, so I might change my opinion with more usage). It got quite an big edge on the AGIEval Benchmark for models in it's class.
	I plan to extend its context length to 32k with POSE.

	## Model Details

	- Base Model: 01-ai/Yi-1.5-9B-Chat
	- chat-template: chatml
	- Dataset: teknium/OpenHermes-2.5
	- Sequence Length: 8192 tokens
	- Training:
	- Epochs: 1
	- Hardware: 4 Nodes x 4 NVIDIA A100 40GB GPUs
	- Duration: 48:32:13
	- Cluster: KIT SCC Cluster

	## Benchmark n_shots=0


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/659c4ecb413a1376bee2f661/0wv3AMaoete7ysT005n89.png)

	\| Benchmark \| Score \|
	\|-------------------\|--------\|
	\| ARC (Challenge) \| 52.47% \|
	\| ARC (Easy) \| 81.65% \|
	\| BoolQ \| 87.22% \|
	\| HellaSwag \| 60.52% \|
	\| OpenBookQA \| 33.60% \|
	\| PIQA \| 81.12% \|
	\| Winogrande \| 72.22% \|
	\| AGIEval \| 38.46% \|
	\| TruthfulQA \| 44.22% \|
	\| MMLU \| 59.72% \|
	\| IFEval \| 47.96% \|


	For detailed benchmark results, including sub-categories and various metrics, please refer to the [full benchmark table](#full-benchmark-results) at the end of this README.

	## GGUF and Quantizations

	- llama.cpp [b3166](https://github.com/ggerganov/llama.cpp/releases/tag/b3166)
	- [juvi21/Hermes-2.5-Yi-1.5-9B-Chat-GGUF](https://huggingface.co./juvi21/Hermes-2.5-Yi-1.5-9B-Chat-GGUF) is availabe in:
	- F16 Q8_0 Q6_KQ5_K_M Q4_K_M Q3_K_M Q2_K



	## Usage

	To use this model, you can load it using the Hugging Face Transformers library:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("juvi21/Hermes-2.5-Yi-1.5-9B-Chat")
	tokenizer = AutoTokenizer.from_pretrained("juvi21/Hermes-2.5-Yi-1.5-9B-Chat")

	# Generate text
	input_text = "What is the question to 42?"
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(**inputs)
	generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(generated_text)

	```

	## chatml
	```
	<\|im_start\|>system
	{system_prompt}<\|im_end\|>
	<\|im_start\|>user
	Knock Knock, who is there?<\|im_end\|>
	<\|im_start\|>assistant
	Hi there! <\|im_end\|>
	```
	## License

	This model is released under the Apache 2.0 license.

	## Acknowledgements

	Special thanks to:
	- Teknium for the great OpenHermes-2.5 dataset
	- 01-ai for their great model
	- KIT SCC for FLOPS

	## Citation

	If you use this model in your research, consider citing. Although definetly cite NousResearch and 01-ai:

	```bibtex
	@misc{
	author = {juvi21},
	title = Hermes-2.5-Yi-1.5-9B-Chat},
	year = {2024},
	}
	```
	## full-benchmark-results

	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \| Value \| \|Stderr\|
	\|---------------------------------------\|-------\|------\|-----:\|-----------------------\|---\|------:\|---\|------\|
	\|agieval \|N/A \|none \| 0\|acc \|↑ \| 0.5381\|± \|0.0049\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.5715\|± \|0.0056\|
	\| - agieval_aqua_rat \| 1\|none \| 0\|acc \|↑ \| 0.3858\|± \|0.0306\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.3425\|± \|0.0298\|
	\| - agieval_gaokao_biology \| 1\|none \| 0\|acc \|↑ \| 0.6048\|± \|0.0338\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.6000\|± \|0.0339\|
	\| - agieval_gaokao_chemistry \| 1\|none \| 0\|acc \|↑ \| 0.4879\|± \|0.0348\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.4106\|± \|0.0343\|
	\| - agieval_gaokao_chinese \| 1\|none \| 0\|acc \|↑ \| 0.5935\|± \|0.0314\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.5813\|± \|0.0315\|
	\| - agieval_gaokao_english \| 1\|none \| 0\|acc \|↑ \| 0.8235\|± \|0.0218\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.8431\|± \|0.0208\|
	\| - agieval_gaokao_geography \| 1\|none \| 0\|acc \|↑ \| 0.7085\|± \|0.0323\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.6985\|± \|0.0326\|
	\| - agieval_gaokao_history \| 1\|none \| 0\|acc \|↑ \| 0.7830\|± \|0.0269\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.7660\|± \|0.0277\|
	\| - agieval_gaokao_mathcloze \| 1\|none \| 0\|acc \|↑ \| 0.0508\|± \|0.0203\|
	\| - agieval_gaokao_mathqa \| 1\|none \| 0\|acc \|↑ \| 0.3761\|± \|0.0259\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.3590\|± \|0.0256\|
	\| - agieval_gaokao_physics \| 1\|none \| 0\|acc \|↑ \| 0.4950\|± \|0.0354\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.4700\|± \|0.0354\|
	\| - agieval_jec_qa_ca \| 1\|none \| 0\|acc \|↑ \| 0.6557\|± \|0.0150\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.5926\|± \|0.0156\|
	\| - agieval_jec_qa_kd \| 1\|none \| 0\|acc \|↑ \| 0.7310\|± \|0.0140\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.6610\|± \|0.0150\|
	\| - agieval_logiqa_en \| 1\|none \| 0\|acc \|↑ \| 0.5177\|± \|0.0196\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.4839\|± \|0.0196\|
	\| - agieval_logiqa_zh \| 1\|none \| 0\|acc \|↑ \| 0.4854\|± \|0.0196\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.4501\|± \|0.0195\|
	\| - agieval_lsat_ar \| 1\|none \| 0\|acc \|↑ \| 0.2913\|± \|0.0300\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.2696\|± \|0.0293\|
	\| - agieval_lsat_lr \| 1\|none \| 0\|acc \|↑ \| 0.7196\|± \|0.0199\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.6824\|± \|0.0206\|
	\| - agieval_lsat_rc \| 1\|none \| 0\|acc \|↑ \| 0.7212\|± \|0.0274\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.6989\|± \|0.0280\|
	\| - agieval_math \| 1\|none \| 0\|acc \|↑ \| 0.0910\|± \|0.0091\|
	\| - agieval_sat_en \| 1\|none \| 0\|acc \|↑ \| 0.8204\|± \|0.0268\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.8301\|± \|0.0262\|
	\| - agieval_sat_en_without_passage \| 1\|none \| 0\|acc \|↑ \| 0.5194\|± \|0.0349\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.4806\|± \|0.0349\|
	\| - agieval_sat_math \| 1\|none \| 0\|acc \|↑ \| 0.5864\|± \|0.0333\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.5409\|± \|0.0337\|
	\|arc_challenge \| 1\|none \| 0\|acc \|↑ \| 0.5648\|± \|0.0145\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.5879\|± \|0.0144\|
	\|arc_easy \| 1\|none \| 0\|acc \|↑ \| 0.8241\|± \|0.0078\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.8165\|± \|0.0079\|
	\|boolq \| 2\|none \| 0\|acc \|↑ \| 0.8624\|± \|0.0060\|
	\|hellaswag \| 1\|none \| 0\|acc \|↑ \| 0.5901\|± \|0.0049\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.7767\|± \|0.0042\|
	\|ifeval \| 2\|none \| 0\|inst_level_loose_acc \|↑ \| 0.5156\|± \|N/A \|
	\| \| \|none \| 0\|inst_level_strict_acc \|↑ \| 0.4748\|± \|N/A \|
	\| \| \|none \| 0\|prompt_level_loose_acc \|↑ \| 0.3863\|± \|0.0210\|
	\| \| \|none \| 0\|prompt_level_strict_acc\|↑ \| 0.3309\|± \|0.0202\|
	\|mmlu \|N/A \|none \| 0\|acc \|↑ \| 0.6942\|± \|0.0037\|
	\| - abstract_algebra \| 0\|none \| 0\|acc \|↑ \| 0.4900\|± \|0.0502\|
	\| - anatomy \| 0\|none \| 0\|acc \|↑ \| 0.6815\|± \|0.0402\|
	\| - astronomy \| 0\|none \| 0\|acc \|↑ \| 0.7895\|± \|0.0332\|
	\| - business_ethics \| 0\|none \| 0\|acc \|↑ \| 0.7600\|± \|0.0429\|
	\| - clinical_knowledge \| 0\|none \| 0\|acc \|↑ \| 0.7132\|± \|0.0278\|
	\| - college_biology \| 0\|none \| 0\|acc \|↑ \| 0.8056\|± \|0.0331\|
	\| - college_chemistry \| 0\|none \| 0\|acc \|↑ \| 0.5300\|± \|0.0502\|
	\| - college_computer_science \| 0\|none \| 0\|acc \|↑ \| 0.6500\|± \|0.0479\|
	\| - college_mathematics \| 0\|none \| 0\|acc \|↑ \| 0.4100\|± \|0.0494\|
	\| - college_medicine \| 0\|none \| 0\|acc \|↑ \| 0.6763\|± \|0.0357\|
	\| - college_physics \| 0\|none \| 0\|acc \|↑ \| 0.5000\|± \|0.0498\|
	\| - computer_security \| 0\|none \| 0\|acc \|↑ \| 0.8200\|± \|0.0386\|
	\| - conceptual_physics \| 0\|none \| 0\|acc \|↑ \| 0.7489\|± \|0.0283\|
	\| - econometrics \| 0\|none \| 0\|acc \|↑ \| 0.5877\|± \|0.0463\|
	\| - electrical_engineering \| 0\|none \| 0\|acc \|↑ \| 0.6759\|± \|0.0390\|
	\| - elementary_mathematics \| 0\|none \| 0\|acc \|↑ \| 0.6481\|± \|0.0246\|
	\| - formal_logic \| 0\|none \| 0\|acc \|↑ \| 0.5873\|± \|0.0440\|
	\| - global_facts \| 0\|none \| 0\|acc \|↑ \| 0.3900\|± \|0.0490\|
	\| - high_school_biology \| 0\|none \| 0\|acc \|↑ \| 0.8613\|± \|0.0197\|
	\| - high_school_chemistry \| 0\|none \| 0\|acc \|↑ \| 0.6453\|± \|0.0337\|
	\| - high_school_computer_science \| 0\|none \| 0\|acc \|↑ \| 0.8300\|± \|0.0378\|
	\| - high_school_european_history \| 0\|none \| 0\|acc \|↑ \| 0.8182\|± \|0.0301\|
	\| - high_school_geography \| 0\|none \| 0\|acc \|↑ \| 0.8485\|± \|0.0255\|
	\| - high_school_government_and_politics\| 0\|none \| 0\|acc \|↑ \| 0.8964\|± \|0.0220\|
	\| - high_school_macroeconomics \| 0\|none \| 0\|acc \|↑ \| 0.7923\|± \|0.0206\|
	\| - high_school_mathematics \| 0\|none \| 0\|acc \|↑ \| 0.4407\|± \|0.0303\|
	\| - high_school_microeconomics \| 0\|none \| 0\|acc \|↑ \| 0.8655\|± \|0.0222\|
	\| - high_school_physics \| 0\|none \| 0\|acc \|↑ \| 0.5298\|± \|0.0408\|
	\| - high_school_psychology \| 0\|none \| 0\|acc \|↑ \| 0.8679\|± \|0.0145\|
	\| - high_school_statistics \| 0\|none \| 0\|acc \|↑ \| 0.6898\|± \|0.0315\|
	\| - high_school_us_history \| 0\|none \| 0\|acc \|↑ \| 0.8873\|± \|0.0222\|
	\| - high_school_world_history \| 0\|none \| 0\|acc \|↑ \| 0.8312\|± \|0.0244\|
	\| - human_aging \| 0\|none \| 0\|acc \|↑ \| 0.7085\|± \|0.0305\|
	\| - human_sexuality \| 0\|none \| 0\|acc \|↑ \| 0.7557\|± \|0.0377\|
	\| - humanities \|N/A \|none \| 0\|acc \|↑ \| 0.6323\|± \|0.0067\|
	\| - international_law \| 0\|none \| 0\|acc \|↑ \| 0.8099\|± \|0.0358\|
	\| - jurisprudence \| 0\|none \| 0\|acc \|↑ \| 0.7685\|± \|0.0408\|
	\| - logical_fallacies \| 0\|none \| 0\|acc \|↑ \| 0.7975\|± \|0.0316\|
	\| - machine_learning \| 0\|none \| 0\|acc \|↑ \| 0.5179\|± \|0.0474\|
	\| - management \| 0\|none \| 0\|acc \|↑ \| 0.8835\|± \|0.0318\|
	\| - marketing \| 0\|none \| 0\|acc \|↑ \| 0.9017\|± \|0.0195\|
	\| - medical_genetics \| 0\|none \| 0\|acc \|↑ \| 0.8000\|± \|0.0402\|
	\| - miscellaneous \| 0\|none \| 0\|acc \|↑ \| 0.8225\|± \|0.0137\|
	\| - moral_disputes \| 0\|none \| 0\|acc \|↑ \| 0.7283\|± \|0.0239\|
	\| - moral_scenarios \| 0\|none \| 0\|acc \|↑ \| 0.4860\|± \|0.0167\|
	\| - nutrition \| 0\|none \| 0\|acc \|↑ \| 0.7353\|± \|0.0253\|
	\| - other \|N/A \|none \| 0\|acc \|↑ \| 0.7287\|± \|0.0077\|
	\| - philosophy \| 0\|none \| 0\|acc \|↑ \| 0.7170\|± \|0.0256\|
	\| - prehistory \| 0\|none \| 0\|acc \|↑ \| 0.7346\|± \|0.0246\|
	\| - professional_accounting \| 0\|none \| 0\|acc \|↑ \| 0.5638\|± \|0.0296\|
	\| - professional_law \| 0\|none \| 0\|acc \|↑ \| 0.5163\|± \|0.0128\|
	\| - professional_medicine \| 0\|none \| 0\|acc \|↑ \| 0.6875\|± \|0.0282\|
	\| - professional_psychology \| 0\|none \| 0\|acc \|↑ \| 0.7092\|± \|0.0184\|
	\| - public_relations \| 0\|none \| 0\|acc \|↑ \| 0.6727\|± \|0.0449\|
	\| - security_studies \| 0\|none \| 0\|acc \|↑ \| 0.7347\|± \|0.0283\|
	\| - social_sciences \|N/A \|none \| 0\|acc \|↑ \| 0.7910\|± \|0.0072\|
	\| - sociology \| 0\|none \| 0\|acc \|↑ \| 0.8060\|± \|0.0280\|
	\| - stem \|N/A \|none \| 0\|acc \|↑ \| 0.6581\|± \|0.0081\|
	\| - us_foreign_policy \| 0\|none \| 0\|acc \|↑ \| 0.8900\|± \|0.0314\|
	\| - virology \| 0\|none \| 0\|acc \|↑ \| 0.5301\|± \|0.0389\|
	\| - world_religions \| 0\|none \| 0\|acc \|↑ \| 0.8012\|± \|0.0306\|
	\|openbookqa \| 1\|none \| 0\|acc \|↑ \| 0.3280\|± \|0.0210\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.4360\|± \|0.0222\|
	\|piqa \| 1\|none \| 0\|acc \|↑ \| 0.7982\|± \|0.0094\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.8074\|± \|0.0092\|
	\|truthfulqa \|N/A \|none \| 0\|acc \|↑ \| 0.4746\|± \|0.0116\|
	\| \| \|none \| 0\|bleu_acc \|↑ \| 0.4700\|± \|0.0175\|
	\| \| \|none \| 0\|bleu_diff \|↑ \| 0.3214\|± \|0.6045\|
	\| \| \|none \| 0\|bleu_max \|↑ \|22.5895\|± \|0.7122\|
	\| \| \|none \| 0\|rouge1_acc \|↑ \| 0.4798\|± \|0.0175\|
	\| \| \|none \| 0\|rouge1_diff \|↑ \| 0.0846\|± \|0.7161\|
	\| \| \|none \| 0\|rouge1_max \|↑ \|48.7180\|± \|0.7833\|
	\| \| \|none \| 0\|rouge2_acc \|↑ \| 0.4149\|± \|0.0172\|
	\| \| \|none \| 0\|rouge2_diff \|↑ \|-0.4656\|± \|0.8375\|
	\| \| \|none \| 0\|rouge2_max \|↑ \|34.0585\|± \|0.8974\|
	\| \| \|none \| 0\|rougeL_acc \|↑ \| 0.4651\|± \|0.0175\|
	\| \| \|none \| 0\|rougeL_diff \|↑ \|-0.2804\|± \|0.7217\|
	\| \| \|none \| 0\|rougeL_max \|↑ \|45.2232\|± \|0.7971\|
	\| - truthfulqa_gen \| 3\|none \| 0\|bleu_acc \|↑ \| 0.4700\|± \|0.0175\|
	\| \| \|none \| 0\|bleu_diff \|↑ \| 0.3214\|± \|0.6045\|
	\| \| \|none \| 0\|bleu_max \|↑ \|22.5895\|± \|0.7122\|
	\| \| \|none \| 0\|rouge1_acc \|↑ \| 0.4798\|± \|0.0175\|
	\| \| \|none \| 0\|rouge1_diff \|↑ \| 0.0846\|± \|0.7161\|
	\| \| \|none \| 0\|rouge1_max \|↑ \|48.7180\|± \|0.7833\|
	\| \| \|none \| 0\|rouge2_acc \|↑ \| 0.4149\|± \|0.0172\|
	\| \| \|none \| 0\|rouge2_diff \|↑ \|-0.4656\|± \|0.8375\|
	\| \| \|none \| 0\|rouge2_max \|↑ \|34.0585\|± \|0.8974\|
	\| \| \|none \| 0\|rougeL_acc \|↑ \| 0.4651\|± \|0.0175\|
	\| \| \|none \| 0\|rougeL_diff \|↑ \|-0.2804\|± \|0.7217\|
	\| \| \|none \| 0\|rougeL_max \|↑ \|45.2232\|± \|0.7971\|
	\| - truthfulqa_mc1 \| 2\|none \| 0\|acc \|↑ \| 0.3905\|± \|0.0171\|
	\| - truthfulqa_mc2 \| 2\|none \| 0\|acc \|↑ \| 0.5587\|± \|0.0156\|
	\|winogrande \| 1\|none \| 0\|acc \|↑ \| 0.7388\|± \|0.0123\|

	\| Groups \|Version\|Filter\|n-shot\| Metric \| \| Value \| \|Stderr\|
	\|------------------\|-------\|------\|-----:\|-----------\|---\|------:\|---\|-----:\|
	\|agieval \|N/A \|none \| 0\|acc \|↑ \| 0.5381\|± \|0.0049\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.5715\|± \|0.0056\|
	\|mmlu \|N/A \|none \| 0\|acc \|↑ \| 0.6942\|± \|0.0037\|
	\| - humanities \|N/A \|none \| 0\|acc \|↑ \| 0.6323\|± \|0.0067\|
	\| - other \|N/A \|none \| 0\|acc \|↑ \| 0.7287\|± \|0.0077\|
	\| - social_sciences\|N/A \|none \| 0\|acc \|↑ \| 0.7910\|± \|0.0072\|
	\| - stem \|N/A \|none \| 0\|acc \|↑ \| 0.6581\|± \|0.0081\|
	\|truthfulqa \|N/A \|none \| 0\|acc \|↑ \| 0.4746\|± \|0.0116\|
	\| \| \|none \| 0\|bleu_acc \|↑ \| 0.4700\|± \|0.0175\|
	\| \| \|none \| 0\|bleu_diff \|↑ \| 0.3214\|± \|0.6045\|
	\| \| \|none \| 0\|bleu_max \|↑ \|22.5895\|± \|0.7122\|
	\| \| \|none \| 0\|rouge1_acc \|↑ \| 0.4798\|± \|0.0175\|
	\| \| \|none \| 0\|rouge1_diff\|↑ \| 0.0846\|± \|0.7161\|
	\| \| \|none \| 0\|rouge1_max \|↑ \|48.7180\|± \|0.7833\|
	\| \| \|none \| 0\|rouge2_acc \|↑ \| 0.4149\|± \|0.0172\|
	\| \| \|none \| 0\|rouge2_diff\|↑ \|-0.4656\|± \|0.8375\|
	\| \| \|none \| 0\|rouge2_max \|↑ \|34.0585\|± \|0.8974\|
	\| \| \|none \| 0\|rougeL_acc \|↑ \| 0.4651\|± \|0.0175\|
	\| \| \|none \| 0\|rougeL_diff\|↑ \|-0.2804\|± \|0.7217\|
	\| \| \|none \| 0\|rougeL_max \|↑ \|45.2232\|± \|0.7971\|