Hermes-2.5-Yi-1.5-9B-Chat

This model is a fine-tuned version of 01-ai/Yi-1.5-9B-Chat on the teknium/OpenHermes-2.5 dataset. I'm very happy with the results. The model now seems a lot smarter and "aware" in certain situations (first look, so I might change my opinion with more usage). It got quite an big edge on the AGIEval Benchmark for models in it's class. I plan to extend its context length to 32k with POSE.

Model Details

  • Base Model: 01-ai/Yi-1.5-9B-Chat
  • chat-template: chatml
  • Dataset: teknium/OpenHermes-2.5
  • Sequence Length: 8192 tokens
  • Training:
  • Epochs: 1
  • Hardware: 4 Nodes x 4 NVIDIA A100 40GB GPUs
  • Duration: 48:32:13
  • Cluster: KIT SCC Cluster

Benchmark n_shots=0

image/png

Benchmark Score
ARC (Challenge) 52.47%
ARC (Easy) 81.65%
BoolQ 87.22%
HellaSwag 60.52%
OpenBookQA 33.60%
PIQA 81.12%
Winogrande 72.22%
AGIEval 38.46%
TruthfulQA 44.22%
MMLU 59.72%
IFEval 47.96%

For detailed benchmark results, including sub-categories and various metrics, please refer to the full benchmark table at the end of this README.

GGUF and Quantizations

Usage

To use this model, you can load it using the Hugging Face Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("juvi21/Hermes-2.5-Yi-1.5-9B-Chat")
tokenizer = AutoTokenizer.from_pretrained("juvi21/Hermes-2.5-Yi-1.5-9B-Chat")

# Generate text
input_text = "What is the question to 42?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

chatml

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
Knock Knock, who is there?<|im_end|>
<|im_start|>assistant
Hi there! <|im_end|>

License

This model is released under the Apache 2.0 license.

Acknowledgements

Special thanks to:

  • Teknium for the great OpenHermes-2.5 dataset
  • 01-ai for their great model
  • KIT SCC for FLOPS

Citation

If you use this model in your research, consider citing. Although definetly cite NousResearch and 01-ai:

@misc{
  author = {juvi21},
  title = Hermes-2.5-Yi-1.5-9B-Chat},
  year = {2024},
}

full-benchmark-results

Tasks Version Filter n-shot Metric Value Stderr
agieval N/A none 0 acc ↑ 0.5381 ± 0.0049
none 0 acc_norm ↑ 0.5715 ± 0.0056
- agieval_aqua_rat 1 none 0 acc ↑ 0.3858 ± 0.0306
none 0 acc_norm ↑ 0.3425 ± 0.0298
- agieval_gaokao_biology 1 none 0 acc ↑ 0.6048 ± 0.0338
none 0 acc_norm ↑ 0.6000 ± 0.0339
- agieval_gaokao_chemistry 1 none 0 acc ↑ 0.4879 ± 0.0348
none 0 acc_norm ↑ 0.4106 ± 0.0343
- agieval_gaokao_chinese 1 none 0 acc ↑ 0.5935 ± 0.0314
none 0 acc_norm ↑ 0.5813 ± 0.0315
- agieval_gaokao_english 1 none 0 acc ↑ 0.8235 ± 0.0218
none 0 acc_norm ↑ 0.8431 ± 0.0208
- agieval_gaokao_geography 1 none 0 acc ↑ 0.7085 ± 0.0323
none 0 acc_norm ↑ 0.6985 ± 0.0326
- agieval_gaokao_history 1 none 0 acc ↑ 0.7830 ± 0.0269
none 0 acc_norm ↑ 0.7660 ± 0.0277
- agieval_gaokao_mathcloze 1 none 0 acc ↑ 0.0508 ± 0.0203
- agieval_gaokao_mathqa 1 none 0 acc ↑ 0.3761 ± 0.0259
none 0 acc_norm ↑ 0.3590 ± 0.0256
- agieval_gaokao_physics 1 none 0 acc ↑ 0.4950 ± 0.0354
none 0 acc_norm ↑ 0.4700 ± 0.0354
- agieval_jec_qa_ca 1 none 0 acc ↑ 0.6557 ± 0.0150
none 0 acc_norm ↑ 0.5926 ± 0.0156
- agieval_jec_qa_kd 1 none 0 acc ↑ 0.7310 ± 0.0140
none 0 acc_norm ↑ 0.6610 ± 0.0150
- agieval_logiqa_en 1 none 0 acc ↑ 0.5177 ± 0.0196
none 0 acc_norm ↑ 0.4839 ± 0.0196
- agieval_logiqa_zh 1 none 0 acc ↑ 0.4854 ± 0.0196
none 0 acc_norm ↑ 0.4501 ± 0.0195
- agieval_lsat_ar 1 none 0 acc ↑ 0.2913 ± 0.0300
none 0 acc_norm ↑ 0.2696 ± 0.0293
- agieval_lsat_lr 1 none 0 acc ↑ 0.7196 ± 0.0199
none 0 acc_norm ↑ 0.6824 ± 0.0206
- agieval_lsat_rc 1 none 0 acc ↑ 0.7212 ± 0.0274
none 0 acc_norm ↑ 0.6989 ± 0.0280
- agieval_math 1 none 0 acc ↑ 0.0910 ± 0.0091
- agieval_sat_en 1 none 0 acc ↑ 0.8204 ± 0.0268
none 0 acc_norm ↑ 0.8301 ± 0.0262
- agieval_sat_en_without_passage 1 none 0 acc ↑ 0.5194 ± 0.0349
none 0 acc_norm ↑ 0.4806 ± 0.0349
- agieval_sat_math 1 none 0 acc ↑ 0.5864 ± 0.0333
none 0 acc_norm ↑ 0.5409 ± 0.0337
arc_challenge 1 none 0 acc ↑ 0.5648 ± 0.0145
none 0 acc_norm ↑ 0.5879 ± 0.0144
arc_easy 1 none 0 acc ↑ 0.8241 ± 0.0078
none 0 acc_norm ↑ 0.8165 ± 0.0079
boolq 2 none 0 acc ↑ 0.8624 ± 0.0060
hellaswag 1 none 0 acc ↑ 0.5901 ± 0.0049
none 0 acc_norm ↑ 0.7767 ± 0.0042
ifeval 2 none 0 inst_level_loose_acc ↑ 0.5156 ± N/A
none 0 inst_level_strict_acc ↑ 0.4748 ± N/A
none 0 prompt_level_loose_acc ↑ 0.3863 ± 0.0210
none 0 prompt_level_strict_acc ↑ 0.3309 ± 0.0202
mmlu N/A none 0 acc ↑ 0.6942 ± 0.0037
- abstract_algebra 0 none 0 acc ↑ 0.4900 ± 0.0502
- anatomy 0 none 0 acc ↑ 0.6815 ± 0.0402
- astronomy 0 none 0 acc ↑ 0.7895 ± 0.0332
- business_ethics 0 none 0 acc ↑ 0.7600 ± 0.0429
- clinical_knowledge 0 none 0 acc ↑ 0.7132 ± 0.0278
- college_biology 0 none 0 acc ↑ 0.8056 ± 0.0331
- college_chemistry 0 none 0 acc ↑ 0.5300 ± 0.0502
- college_computer_science 0 none 0 acc ↑ 0.6500 ± 0.0479
- college_mathematics 0 none 0 acc ↑ 0.4100 ± 0.0494
- college_medicine 0 none 0 acc ↑ 0.6763 ± 0.0357
- college_physics 0 none 0 acc ↑ 0.5000 ± 0.0498
- computer_security 0 none 0 acc ↑ 0.8200 ± 0.0386
- conceptual_physics 0 none 0 acc ↑ 0.7489 ± 0.0283
- econometrics 0 none 0 acc ↑ 0.5877 ± 0.0463
- electrical_engineering 0 none 0 acc ↑ 0.6759 ± 0.0390
- elementary_mathematics 0 none 0 acc ↑ 0.6481 ± 0.0246
- formal_logic 0 none 0 acc ↑ 0.5873 ± 0.0440
- global_facts 0 none 0 acc ↑ 0.3900 ± 0.0490
- high_school_biology 0 none 0 acc ↑ 0.8613 ± 0.0197
- high_school_chemistry 0 none 0 acc ↑ 0.6453 ± 0.0337
- high_school_computer_science 0 none 0 acc ↑ 0.8300 ± 0.0378
- high_school_european_history 0 none 0 acc ↑ 0.8182 ± 0.0301
- high_school_geography 0 none 0 acc ↑ 0.8485 ± 0.0255
- high_school_government_and_politics 0 none 0 acc ↑ 0.8964 ± 0.0220
- high_school_macroeconomics 0 none 0 acc ↑ 0.7923 ± 0.0206
- high_school_mathematics 0 none 0 acc ↑ 0.4407 ± 0.0303
- high_school_microeconomics 0 none 0 acc ↑ 0.8655 ± 0.0222
- high_school_physics 0 none 0 acc ↑ 0.5298 ± 0.0408
- high_school_psychology 0 none 0 acc ↑ 0.8679 ± 0.0145
- high_school_statistics 0 none 0 acc ↑ 0.6898 ± 0.0315
- high_school_us_history 0 none 0 acc ↑ 0.8873 ± 0.0222
- high_school_world_history 0 none 0 acc ↑ 0.8312 ± 0.0244
- human_aging 0 none 0 acc ↑ 0.7085 ± 0.0305
- human_sexuality 0 none 0 acc ↑ 0.7557 ± 0.0377
- humanities N/A none 0 acc ↑ 0.6323 ± 0.0067
- international_law 0 none 0 acc ↑ 0.8099 ± 0.0358
- jurisprudence 0 none 0 acc ↑ 0.7685 ± 0.0408
- logical_fallacies 0 none 0 acc ↑ 0.7975 ± 0.0316
- machine_learning 0 none 0 acc ↑ 0.5179 ± 0.0474
- management 0 none 0 acc ↑ 0.8835 ± 0.0318
- marketing 0 none 0 acc ↑ 0.9017 ± 0.0195
- medical_genetics 0 none 0 acc ↑ 0.8000 ± 0.0402
- miscellaneous 0 none 0 acc ↑ 0.8225 ± 0.0137
- moral_disputes 0 none 0 acc ↑ 0.7283 ± 0.0239
- moral_scenarios 0 none 0 acc ↑ 0.4860 ± 0.0167
- nutrition 0 none 0 acc ↑ 0.7353 ± 0.0253
- other N/A none 0 acc ↑ 0.7287 ± 0.0077
- philosophy 0 none 0 acc ↑ 0.7170 ± 0.0256
- prehistory 0 none 0 acc ↑ 0.7346 ± 0.0246
- professional_accounting 0 none 0 acc ↑ 0.5638 ± 0.0296
- professional_law 0 none 0 acc ↑ 0.5163 ± 0.0128
- professional_medicine 0 none 0 acc ↑ 0.6875 ± 0.0282
- professional_psychology 0 none 0 acc ↑ 0.7092 ± 0.0184
- public_relations 0 none 0 acc ↑ 0.6727 ± 0.0449
- security_studies 0 none 0 acc ↑ 0.7347 ± 0.0283
- social_sciences N/A none 0 acc ↑ 0.7910 ± 0.0072
- sociology 0 none 0 acc ↑ 0.8060 ± 0.0280
- stem N/A none 0 acc ↑ 0.6581 ± 0.0081
- us_foreign_policy 0 none 0 acc ↑ 0.8900 ± 0.0314
- virology 0 none 0 acc ↑ 0.5301 ± 0.0389
- world_religions 0 none 0 acc ↑ 0.8012 ± 0.0306
openbookqa 1 none 0 acc ↑ 0.3280 ± 0.0210
none 0 acc_norm ↑ 0.4360 ± 0.0222
piqa 1 none 0 acc ↑ 0.7982 ± 0.0094
none 0 acc_norm ↑ 0.8074 ± 0.0092
truthfulqa N/A none 0 acc ↑ 0.4746 ± 0.0116
none 0 bleu_acc ↑ 0.4700 ± 0.0175
none 0 bleu_diff ↑ 0.3214 ± 0.6045
none 0 bleu_max ↑ 22.5895 ± 0.7122
none 0 rouge1_acc ↑ 0.4798 ± 0.0175
none 0 rouge1_diff ↑ 0.0846 ± 0.7161
none 0 rouge1_max ↑ 48.7180 ± 0.7833
none 0 rouge2_acc ↑ 0.4149 ± 0.0172
none 0 rouge2_diff ↑ -0.4656 ± 0.8375
none 0 rouge2_max ↑ 34.0585 ± 0.8974
none 0 rougeL_acc ↑ 0.4651 ± 0.0175
none 0 rougeL_diff ↑ -0.2804 ± 0.7217
none 0 rougeL_max ↑ 45.2232 ± 0.7971
- truthfulqa_gen 3 none 0 bleu_acc ↑ 0.4700 ± 0.0175
none 0 bleu_diff ↑ 0.3214 ± 0.6045
none 0 bleu_max ↑ 22.5895 ± 0.7122
none 0 rouge1_acc ↑ 0.4798 ± 0.0175
none 0 rouge1_diff ↑ 0.0846 ± 0.7161
none 0 rouge1_max ↑ 48.7180 ± 0.7833
none 0 rouge2_acc ↑ 0.4149 ± 0.0172
none 0 rouge2_diff ↑ -0.4656 ± 0.8375
none 0 rouge2_max ↑ 34.0585 ± 0.8974
none 0 rougeL_acc ↑ 0.4651 ± 0.0175
none 0 rougeL_diff ↑ -0.2804 ± 0.7217
none 0 rougeL_max ↑ 45.2232 ± 0.7971
- truthfulqa_mc1 2 none 0 acc ↑ 0.3905 ± 0.0171
- truthfulqa_mc2 2 none 0 acc ↑ 0.5587 ± 0.0156
winogrande 1 none 0 acc ↑ 0.7388 ± 0.0123
Groups Version Filter n-shot Metric Value Stderr
agieval N/A none 0 acc ↑ 0.5381 ± 0.0049
none 0 acc_norm ↑ 0.5715 ± 0.0056
mmlu N/A none 0 acc ↑ 0.6942 ± 0.0037
- humanities N/A none 0 acc ↑ 0.6323 ± 0.0067
- other N/A none 0 acc ↑ 0.7287 ± 0.0077
- social_sciences N/A none 0 acc ↑ 0.7910 ± 0.0072
- stem N/A none 0 acc ↑ 0.6581 ± 0.0081
truthfulqa N/A none 0 acc ↑ 0.4746 ± 0.0116
none 0 bleu_acc ↑ 0.4700 ± 0.0175
none 0 bleu_diff ↑ 0.3214 ± 0.6045
none 0 bleu_max ↑ 22.5895 ± 0.7122
none 0 rouge1_acc ↑ 0.4798 ± 0.0175
none 0 rouge1_diff ↑ 0.0846 ± 0.7161
none 0 rouge1_max ↑ 48.7180 ± 0.7833
none 0 rouge2_acc ↑ 0.4149 ± 0.0172
none 0 rouge2_diff ↑ -0.4656 ± 0.8375
none 0 rouge2_max ↑ 34.0585 ± 0.8974
none 0 rougeL_acc ↑ 0.4651 ± 0.0175
none 0 rougeL_diff ↑ -0.2804 ± 0.7217
none 0 rougeL_max ↑ 45.2232 ± 0.7971
Downloads last month
22
Safetensors
Model size
8.83B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for juvi21/Hermes-2.5-Yi-1.5-9B-Chat

Quantizations
2 models

Dataset used to train juvi21/Hermes-2.5-Yi-1.5-9B-Chat

Collection including juvi21/Hermes-2.5-Yi-1.5-9B-Chat