ReasonEval-7B Model Card
Model Description
ReasonEval-7B
is a 7B parameter decoder-only language model fine-tuned from WizardMath-7B-V1.1
. Given a mathematical problem and the solution, ReasonEval-7B
assesses the problem-solving process in a step-by-step format from the following perspectives:
- Validity: The step contains no mistakes in calculation and logic.
- Redundancy: The step lacks utility in solving the problem but is still valid.
With ReasonEval, you can
๐ quantify the quality of reasoning steps free of human or close-source models.
๐ค find the potential invalid or redundant steps in the solutions even with the correct results.
๐ ๏ธ select high-quality training data for downstream tasks (e.g., fine-tuning).
Model Details
- Model type:
ReasonEval-7B
's architecture is identical toWizardMath-7B-V1.1
, except that the classification head for next-token prediction is replaced with a classification head for outputting the possibilities of each class of reasong steps. - Language(s): English
- Paper: Evaluating Mathematical Reasoning Beyond Accuracy
- Github: https://github.com/GAIR-NLP/ReasonEval
- Finetuned from model: https://huggingface.co./WizardLM/WizardMath-7B-V1.1
- Fine-tuning Data: PRM800K
For detailed instructions on how to use the ReasonEval-7B model, visit our GitHub repository at https://github.com/GAIR-NLP/ReasonEval.
How to Cite
@article{xia2024evaluating,
title={Evaluating Mathematical Reasoning Beyond Accuracy},
author={Xia, Shijie and Li, Xuefeng and Liu, Yixin and Wu, Tongshuang and Liu, Pengfei},
journal={arXiv preprint arXiv:2404.05692},
year={2024},
}
- Downloads last month
- 85
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.