juvi21's picture
Update README.md
e955dda verified
metadata
license: apache-2.0
datasets:
  - BAAI/Infinity-Instruct
tags:
  - axolotl
  - NousResearch/Hermes-2-Pro-Mistral-7B
  - finetune

Hermes 2 Pro Mistral-7B Infinity-Instruct

This model is a fine-tuned version of NousResearch/Hermes-2-Pro-Mistral-7B on the BAAI/Infinity-Instruct dataset.

Model Details

  • Base Model: NousResearch/Hermes-2-Pro-Mistral-7B
  • Dataset: BAAI/Infinity-Instruct
  • Sequence Length: 8192 tokens
  • Training:
  • Epochs: 1
  • Hardware: 4 Nodes x 4 NVIDIA A100 40GB GPUs
  • Duration: 26:56:43
  • Cluster: KIT SCC Cluster

Benchmark n_shots=0

Benchmark Results

Benchmark Score
ARC (Challenge) 52.47%
ARC (Easy) 81.65%
BoolQ 87.22%
HellaSwag 60.52%
OpenBookQA 33.60%
PIQA 81.12%
Winogrande 72.22%
AGIEval 38.46%
TruthfulQA 44.22%
MMLU 59.72%
IFEval 47.96%

For detailed benchmark results, including sub-categories and various metrics, please refer to the full benchmark table at the end of this README.

GGUF and Quantizations

Usage

To use this model, you can load it using the Hugging Face Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("your-model-name")
tokenizer = AutoTokenizer.from_pretrained("your-model-name")

# Generate text
input_text = "Your prompt here"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

License

This model is released under the Apache 2.0 license.

Acknowledgements

Special thanks to:

  • NousResearch for their excellent base model
  • BAAI for providing the Infinity-Instruct dataset

Citation

If you use this model in your research, consider citing. Although definetly cite NousResearch and BAAI:

@misc{hermes2pro-mistral-7b-infinity,
  author = {juvi21},
  title = {Hermes 2 Pro Mistral-7B Infinity-Instruct},
  year = {2024},
}

full-benchmark-results

Tasks Version Filter n-shot Metric Value Stderr
agieval N/A none 0 acc 0.3846 ± 0.0051
none 0 acc_norm 0.4186 ± 0.0056
- agieval_aqua_rat 1 none 0 acc 0.2520 ± 0.0273
none 0 acc_norm 0.2323 ± 0.0265
- agieval_gaokao_biology 1 none 0 acc 0.2952 ± 0.0316
none 0 acc_norm 0.3381 ± 0.0327
- agieval_gaokao_chemistry 1 none 0 acc 0.2560 ± 0.0304
none 0 acc_norm 0.2850 ± 0.0315
- agieval_gaokao_chinese 1 none 0 acc 0.2317 ± 0.0270
none 0 acc_norm 0.2236 ± 0.0266
- agieval_gaokao_english 1 none 0 acc 0.6667 ± 0.0270
none 0 acc_norm 0.6863 ± 0.0266
- agieval_gaokao_geography 1 none 0 acc 0.3869 ± 0.0346
none 0 acc_norm 0.4020 ± 0.0348
- agieval_gaokao_history 1 none 0 acc 0.4468 ± 0.0325
none 0 acc_norm 0.3957 ± 0.0320
- agieval_gaokao_mathcloze 1 none 0 acc 0.0254 ± 0.0146
- agieval_gaokao_mathqa 1 none 0 acc 0.2507 ± 0.0232
none 0 acc_norm 0.2621 ± 0.0235
- agieval_gaokao_physics 1 none 0 acc 0.2900 ± 0.0322
none 0 acc_norm 0.3100 ± 0.0328
- agieval_jec_qa_ca 1 none 0 acc 0.4735 ± 0.0158
none 0 acc_norm 0.4695 ± 0.0158
- agieval_jec_qa_kd 1 none 0 acc 0.5290 ± 0.0158
none 0 acc_norm 0.5140 ± 0.0158
- agieval_logiqa_en 1 none 0 acc 0.3579 ± 0.0188
none 0 acc_norm 0.3779 ± 0.0190
- agieval_logiqa_zh 1 none 0 acc 0.3103 ± 0.0181
none 0 acc_norm 0.3318 ± 0.0185
- agieval_lsat_ar 1 none 0 acc 0.2217 ± 0.0275
none 0 acc_norm 0.2217 ± 0.0275
- agieval_lsat_lr 1 none 0 acc 0.5333 ± 0.0221
none 0 acc_norm 0.5098 ± 0.0222
- agieval_lsat_rc 1 none 0 acc 0.5948 ± 0.0300
none 0 acc_norm 0.5353 ± 0.0305
- agieval_math 1 none 0 acc 0.1520 ± 0.0114
- agieval_sat_en 1 none 0 acc 0.7864 ± 0.0286
none 0 acc_norm 0.7621 ± 0.0297
- agieval_sat_en_without_passage 1 none 0 acc 0.4660 ± 0.0348
none 0 acc_norm 0.4272 ± 0.0345
- agieval_sat_math 1 none 0 acc 0.3591 ± 0.0324
none 0 acc_norm 0.3045 ± 0.0311
arc_challenge 1 none 0 acc 0.5247 ± 0.0146
none 0 acc_norm 0.5538 ± 0.0145
arc_easy 1 none 0 acc 0.8165 ± 0.0079
none 0 acc_norm 0.7934 ± 0.0083
boolq 2 none 0 acc 0.8722 ± 0.0058
hellaswag 1 none 0 acc 0.6052 ± 0.0049
none 0 acc_norm 0.7941 ± 0.0040
ifeval 2 none 0 inst_level_loose_acc 0.5132 ± N/A
none 0 inst_level_strict_acc 0.4796 ± N/A
none 0 prompt_level_loose_acc 0.4122 ± 0.0212
none 0 prompt_level_strict_acc 0.3734 ± 0.0208
mmlu N/A none 0 acc 0.5972 ± 0.0039
- abstract_algebra 0 none 0 acc 0.3100 ± 0.0465
- anatomy 0 none 0 acc 0.5852 ± 0.0426
- astronomy 0 none 0 acc 0.6447 ± 0.0389
- business_ethics 0 none 0 acc 0.5800 ± 0.0496
- clinical_knowledge 0 none 0 acc 0.6830 ± 0.0286
- college_biology 0 none 0 acc 0.7153 ± 0.0377
- college_chemistry 0 none 0 acc 0.4500 ± 0.0500
- college_computer_science 0 none 0 acc 0.4900 ± 0.0502
- college_mathematics 0 none 0 acc 0.3100 ± 0.0465
- college_medicine 0 none 0 acc 0.6069 ± 0.0372
- college_physics 0 none 0 acc 0.4020 ± 0.0488
- computer_security 0 none 0 acc 0.7200 ± 0.0451
- conceptual_physics 0 none 0 acc 0.5234 ± 0.0327
- econometrics 0 none 0 acc 0.4123 ± 0.0463
- electrical_engineering 0 none 0 acc 0.4759 ± 0.0416
- elementary_mathematics 0 none 0 acc 0.4180 ± 0.0254
- formal_logic 0 none 0 acc 0.4286 ± 0.0443
- global_facts 0 none 0 acc 0.3400 ± 0.0476
- high_school_biology 0 none 0 acc 0.7419 ± 0.0249
- high_school_chemistry 0 none 0 acc 0.4631 ± 0.0351
- high_school_computer_science 0 none 0 acc 0.6300 ± 0.0485
- high_school_european_history 0 none 0 acc 0.7394 ± 0.0343
- high_school_geography 0 none 0 acc 0.7323 ± 0.0315
- high_school_government_and_politics 0 none 0 acc 0.8238 ± 0.0275
- high_school_macroeconomics 0 none 0 acc 0.6308 ± 0.0245
- high_school_mathematics 0 none 0 acc 0.3333 ± 0.0287
- high_school_microeconomics 0 none 0 acc 0.6387 ± 0.0312
- high_school_physics 0 none 0 acc 0.2914 ± 0.0371
- high_school_psychology 0 none 0 acc 0.8128 ± 0.0167
- high_school_statistics 0 none 0 acc 0.4907 ± 0.0341
- high_school_us_history 0 none 0 acc 0.8186 ± 0.0270
- high_school_world_history 0 none 0 acc 0.8186 ± 0.0251
- human_aging 0 none 0 acc 0.6771 ± 0.0314
- human_sexuality 0 none 0 acc 0.7176 ± 0.0395
- humanities N/A none 0 acc 0.5411 ± 0.0066
- international_law 0 none 0 acc 0.7603 ± 0.0390
- jurisprudence 0 none 0 acc 0.7593 ± 0.0413
- logical_fallacies 0 none 0 acc 0.7239 ± 0.0351
- machine_learning 0 none 0 acc 0.5268 ± 0.0474
- management 0 none 0 acc 0.7864 ± 0.0406
- marketing 0 none 0 acc 0.8547 ± 0.0231
- medical_genetics 0 none 0 acc 0.6500 ± 0.0479
- miscellaneous 0 none 0 acc 0.7918 ± 0.0145
- moral_disputes 0 none 0 acc 0.6705 ± 0.0253
- moral_scenarios 0 none 0 acc 0.2268 ± 0.0140
- nutrition 0 none 0 acc 0.6961 ± 0.0263
- other N/A none 0 acc 0.6720 ± 0.0081
- philosophy 0 none 0 acc 0.6945 ± 0.0262
- prehistory 0 none 0 acc 0.6975 ± 0.0256
- professional_accounting 0 none 0 acc 0.4539 ± 0.0297
- professional_law 0 none 0 acc 0.4537 ± 0.0127
- professional_medicine 0 none 0 acc 0.6176 ± 0.0295
- professional_psychology 0 none 0 acc 0.6275 ± 0.0196
- public_relations 0 none 0 acc 0.6364 ± 0.0461
- security_studies 0 none 0 acc 0.7061 ± 0.0292
- social_sciences N/A none 0 acc 0.7043 ± 0.0080
- sociology 0 none 0 acc 0.8458 ± 0.0255
- stem N/A none 0 acc 0.5027 ± 0.0086
- us_foreign_policy 0 none 0 acc 0.8400 ± 0.0368
- virology 0 none 0 acc 0.5060 ± 0.0389
- world_religions 0 none 0 acc 0.8421 ± 0.0280
openbookqa 1 none 0 acc 0.3360 ± 0.0211
none 0 acc_norm 0.4380 ± 0.0222
piqa 1 none 0 acc 0.8112 ± 0.0091
none 0 acc_norm 0.8194 ± 0.0090
truthfulqa N/A none 0 acc 0.4422 ± 0.0113
none 0 bleu_acc 0.5398 ± 0.0174
none 0 bleu_diff 6.0075 ± 0.9539
none 0 bleu_max 30.9946 ± 0.8538
none 0 rouge1_acc 0.5545 ± 0.0174
none 0 rouge1_diff 8.7352 ± 1.2500
none 0 rouge1_max 57.5941 ± 0.8750
none 0 rouge2_acc 0.4810 ± 0.0175
none 0 rouge2_diff 7.9063 ± 1.3837
none 0 rouge2_max 43.4572 ± 1.0786
none 0 rougeL_acc 0.5239 ± 0.0175
none 0 rougeL_diff 8.3871 ± 1.2689
none 0 rougeL_max 54.6542 ± 0.9060
- truthfulqa_gen 3 none 0 bleu_acc 0.5398 ± 0.0174
none 0 bleu_diff 6.0075 ± 0.9539
none 0 bleu_max 30.9946 ± 0.8538
none 0 rouge1_acc 0.5545 ± 0.0174
none 0 rouge1_diff 8.7352 ± 1.2500
none 0 rouge1_max 57.5941 ± 0.8750
none 0 rouge2_acc 0.4810 ± 0.0175
none 0 rouge2_diff 7.9063 ± 1.3837
none 0 rouge2_max 43.4572 ± 1.0786
none 0 rougeL_acc 0.5239 ± 0.0175
none 0 rougeL_diff 8.3871 ± 1.2689
none 0 rougeL_max 54.6542 ± 0.9060
- truthfulqa_mc1 2 none 0 acc 0.3574 ± 0.0168
- truthfulqa_mc2 2 none 0 acc 0.5269 ± 0.0152
winogrande 1 none 0 acc 0.7222 ± 0.0126
Groups Version Filter n-shot Metric Value Stderr
agieval N/A none 0 acc 0.3846 ± 0.0051
none 0 acc_norm 0.4186 ± 0.0056
mmlu N/A none 0 acc 0.5972 ± 0.0039
- humanities N/A none 0 acc 0.5411 ± 0.0066
- other N/A none 0 acc 0.6720 ± 0.0081
- social_sciences N/A none 0 acc 0.7043 ± 0.0080
- stem N/A none 0 acc 0.5027 ± 0.0086
truthfulqa N/A none 0 acc 0.4422 ± 0.0113
none 0 bleu_acc 0.5398 ± 0.0174
none 0 bleu_diff 6.0075 ± 0.9539
none 0 bleu_max 30.9946 ± 0.8538
none 0 rouge1_acc 0.5545 ± 0.0174
none 0 rouge1_diff 8.7352 ± 1.2500
none 0 rouge1_max 57.5941 ± 0.8750
none 0 rouge2_acc 0.4810 ± 0.0175
none 0 rouge2_diff 7.9063 ± 1.3837
none 0 rouge2_max 43.4572 ± 1.0786
none 0 rougeL_acc 0.5239 ± 0.0175
none 0 rougeL_diff 8.3871 ± 1.2689
none 0 rougeL_max 54.6542 ± 0.9060