Ask questions about training data construction
1
#8 opened 3 days ago
by
zzzzz2023
A question about the effectiveness of Qwen2.5-Math-PRM-7B in reinforcement learning
#7 opened 4 days ago
by
zsyyy
If the response length exceeds 4096, is a sliding window used, or is it simply truncated?
#6 opened 7 days ago
by
ShelterW
question about the step separato "\n\n"
1
#3 opened 9 days ago
by
pixas
Could you clarify whether the PRM800K deduplication was performed using the original 5000-test set from MATH or the MATH500 dataset?
3
#2 opened 10 days ago
by
masterLan
vllm support
1
#1 opened 10 days ago
by
baohao