Zhenru commited on
Commit
0d04bd1
·
verified ·
1 Parent(s): 42fa761

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -47,7 +47,7 @@ For requirements on GPU memory and the respective throughput, see similar result
47
  > **Qwen2.5-Math-7B-PRM800K** is a process reward model typically used for offering feedback on the quality of reasoning and intermediate steps rather than generation.
48
 
49
  ### Prerequisites
50
- - Step Separation: We recommend using double line breaks ("\n\n") to separate individual steps within the solution.
51
  - Reward Computation: After each step, we insert a special token "`<extra_0>`". For reward calculation, we extract the probability score of this token being classified as positive, resulting in a reward value between 0 and 1.
52
 
53
  ### 🤗 Hugging Face Transformers
 
47
  > **Qwen2.5-Math-7B-PRM800K** is a process reward model typically used for offering feedback on the quality of reasoning and intermediate steps rather than generation.
48
 
49
  ### Prerequisites
50
+ - Step Separation: We recommend using double line breaks ("\n\n") to separate individual steps within the solution if using responses from Qwen2.5-Math-Instruct.
51
  - Reward Computation: After each step, we insert a special token "`<extra_0>`". For reward calculation, we extract the probability score of this token being classified as positive, resulting in a reward value between 0 and 1.
52
 
53
  ### 🤗 Hugging Face Transformers