Qwen
/

Qwen2.5-Math-7B-PRM800K

Text Classification

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Zhenru commited on 19 days ago

Commit

0d04bd1

·

verified ·

1 Parent(s): 42fa761

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -47,7 +47,7 @@ For requirements on GPU memory and the respective throughput, see similar result
 > **Qwen2.5-Math-7B-PRM800K** is a process reward model typically used for offering feedback on the quality of reasoning and intermediate steps rather than generation.
 ### Prerequisites
-- Step Separation: We recommend using double line breaks ("\n\n") to separate individual steps within the solution.
 - Reward Computation: After each step, we insert a special token "`<extra_0>`". For reward calculation, we extract the probability score of this token being classified as positive, resulting in a reward value between 0 and 1.
 ### 🤗 Hugging Face Transformers

 > **Qwen2.5-Math-7B-PRM800K** is a process reward model typically used for offering feedback on the quality of reasoning and intermediate steps rather than generation.
 ### Prerequisites
+- Step Separation: We recommend using double line breaks ("\n\n") to separate individual steps within the solution if using responses from Qwen2.5-Math-Instruct.
 - Reward Computation: After each step, we insert a special token "`<extra_0>`". For reward calculation, we extract the probability score of this token being classified as positive, resulting in a reward value between 0 and 1.
 ### 🤗 Hugging Face Transformers