MinghaoYang ytc0324 commited on
Commit
3b068e5
·
verified ·
1 Parent(s): d0d996d

Update README.md (#1)

Browse files

- Update README.md (b5b1bf995dba5339b8c9349401e7687aba85e649)


Co-authored-by: tianchuyao <[email protected]>

Files changed (1) hide show
  1. README.md +79 -56
README.md CHANGED
@@ -1,56 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- Inference Demo
5
- ```python
6
- from transformers import AutoModel, AutoTokenizer
7
- import torch
8
- import json
9
-
10
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
11
-
12
- model_path = 'model_path'
13
-
14
- tokenizer = AutoTokenizer.from_pretrained(model_path)
15
- model = AutoModel.from_pretrained(
16
- model_path,
17
- device_map=device,
18
- torch_dtype=torch.bfloat16,
19
- trust_remote_code=True,
20
- ).eval()
21
-
22
- question = "It's April, and Mrs. Rylan has been busy on her farm planting different types of vegetables for the season. She has bought 20 packets of tomato seeds and 80 packets of celery seeds to plant. If a packet of tomato seeds costs $40 and a packet of celery seeds costs $30, how much money did she use to buy the seeds?"
23
- ground_truth_solution = "The total amount of money she used to buy the tomato seeds is 20 packets * $40/packet = $<<20*40=800>>800\nThe celery seeds cost her 80 packets * $30/packet = $<<80*30=2400>>2400\nFor the seeds, Mrs. Rylan paid $2400 + $800 = $<<2400+800=3200>>3200\n#### 3200"
24
- steps = ["To find out how much money Mrs. Rylan used to buy the seeds, we need to calculate the total cost of tomato seeds and celery seeds separately, then add them together.", "First, calculate the total cost of tomato seeds. Number of packets of tomato seeds = 20. Cost per packet of tomato seeds = $40. Total cost of tomato seeds = Number of packets of tomato seeds * Cost per packet of tomato seeds = 20 * $40 = $800.", "Second, calculate the total cost of celery seeds. Number of packets of celery seeds = 80. Cost per packet of celery seeds = $30. Total cost of celery seeds = Number of packets of celery seeds * Cost per packet of celery seeds = 80 * $30 = $2400.", "Finally, calculate the total amount of money used to buy the seeds. Total amount of money = Total cost of tomato seeds + Total cost of celery seeds = $800 + $2400 = $3200.", "Therefore, Mrs. Rylan used \\boxed{$3200} to buy the seeds."]
25
-
26
- if ground_truth_solution != '':
27
- question_wgt = question + '\n\n###\n\nThe reference answer is: ' + ground_truth_solution
28
- else:
29
- question_wgt = question + '\n\n###\n\nThe reference answer is: There is no reference answer for this question.'
30
-
31
- judge_list_infer = []
32
- with torch.no_grad():
33
- for step_idx in range(1, len(steps) + 1):
34
- responses = "\n\n".join(steps[:step_idx]) + "\n\n"
35
- messages = [
36
- {"role": "system", "content": "You are a helpful assistant."},
37
- {"role": "user", "content": question_wgt}
38
- ]
39
- query_id = tokenizer.apply_chat_template(
40
- messages,
41
- tokenize=True,
42
- add_generation_prompt=True
43
- )
44
- answer_tokens = tokenizer(responses)['input_ids']
45
- answer_tokens += [tokenizer.eos_token_id]
46
- QA_ids = query_id + answer_tokens
47
-
48
- input_ids = torch.tensor([QA_ids]).long().cuda().contiguous()
49
-
50
- outputs = model(input_ids=input_ids)
51
- reward = torch.sigmoid(outputs[0]).cpu().item()
52
- judge_list_infer.append(reward)
53
-
54
- print(judge_list_infer) # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
55
-
56
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Universal-PRM-7B
5
+ ## 1. Overview
6
+ Universal-PRM is trained using Qwen2.5-Math-7B-Instruct as the base. The training process incorporates diverse policy distributions, ensemble prompting, and reverse verification to enhance generalization and robustness. It achieves state-of-the-art performance on ProcessBench and the internally developed UniversalBench.
7
+ ## 2. Experiments
8
+ ### ProcessBench
9
+ | Model | GSM8K | MATH | Olympiad-Bench | Omni-MATH | Average |
10
+ |-------------------------------|------:|-----:|---------------:|----------:|--------:|
11
+ | Math-Shepherd-PRM-7B | 47.9 | 29.5 | 24.8 | 23.8 | 31.5 |
12
+ | RLHFflow-PRM-Mistral-8B | 50.4 | 33.4 | 13.8 | 15.8 | 28.4 |
13
+ | Skywork-PRM-7B | 70.8 | 53.6 | 22.9 | 21.0 | 42.1 |
14
+ | Qwen2.5-Math-7B-PRM800K | 68.2 | 62.6 | 50.7 | 44.3 | 56.5 |
15
+ | Qwen2.5-Math-PRM-7B | 82.4 | 77.6 | 67.5 | 66.3 | 73.5 |
16
+ | **Universal-PRM-7B** | **85.8** | **77.7** | **67.6** | **66.4** | **74.3** |
17
+ ### UniversalBench
18
+ | Model | AIME (lng) | AMC (lng) | IMO (lng) | Olympiads (lng) | GSM8K (shrt) | Olympiads (shrt) | MATH (shrt) | Average |
19
+ |----------------------------------|------------|-----------|-----------|----------------|--------------|------------------|-------------|---------|
20
+ | **Math-Shepherd-PRM-7B** | **60.0** | 14.1 | 57.6 | 49.3 | 40.8 | 24.3 | 43.9 | 41.4 |
21
+ | **RLHFflow-PRM-Mistral-8B** | 18.7 | 34.6 | 23.7 | 11.3 | 72.1 | 45.0 | 56.8 | 37.4 |
22
+ | **Skywork-PRM-7B** | 24.0 | 13.2 | 21.8 | 16.5 | 33.9 | 61.7 | 31.8 | 28.9 |
23
+ | **Qwen2.5-Math-7B-PRM800K** | 57.1 | 56.8 | **65.4** | 54.9 | 89.6 | 74.0 | 81.9 | 68.5 |
24
+ | **Qwen2.5-Math-PRM-7B** | 49.0 | 61.6 | 45.3 | 60.2 | 88.8 | 73.7 | 80.7 | 65.6 |
25
+ | **Universal-PRM-7B** | 59.5 | **76.2** | 62.8 | **65.5** | **91.9** | **80.2** | **85.8** | **74.5** |
26
+
27
+ ## 3. Quick Start
28
+ ```python
29
+ from transformers import AutoModel, AutoTokenizer
30
+ import torch
31
+ import json
32
+
33
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
34
+
35
+ model_path = 'infly/Universal-PRM-7B'
36
+
37
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
38
+ model = AutoModel.from_pretrained(
39
+ model_path,
40
+ device_map=device,
41
+ torch_dtype=torch.bfloat16,
42
+ trust_remote_code=True,
43
+ ).eval()
44
+
45
+ question = "It's April, and Mrs. Rylan has been busy on her farm planting different types of vegetables for the season. She has bought 20 packets of tomato seeds and 80 packets of celery seeds to plant. If a packet of tomato seeds costs $40 and a packet of celery seeds costs $30, how much money did she use to buy the seeds?"
46
+ ground_truth_solution = "The total amount of money she used to buy the tomato seeds is 20 packets * $40/packet = $<<20*40=800>>800\nThe celery seeds cost her 80 packets * $30/packet = $<<80*30=2400>>2400\nFor the seeds, Mrs. Rylan paid $2400 + $800 = $<<2400+800=3200>>3200\n#### 3200"
47
+ steps = ["To find out how much money Mrs. Rylan used to buy the seeds, we need to calculate the total cost of tomato seeds and celery seeds separately, then add them together.", "First, calculate the total cost of tomato seeds. Number of packets of tomato seeds = 20. Cost per packet of tomato seeds = $40. Total cost of tomato seeds = Number of packets of tomato seeds * Cost per packet of tomato seeds = 20 * $40 = $800.", "Second, calculate the total cost of celery seeds. Number of packets of celery seeds = 80. Cost per packet of celery seeds = $30. Total cost of celery seeds = Number of packets of celery seeds * Cost per packet of celery seeds = 80 * $30 = $2400.", "Finally, calculate the total amount of money used to buy the seeds. Total amount of money = Total cost of tomato seeds + Total cost of celery seeds = $800 + $2400 = $3200.", "Therefore, Mrs. Rylan used \\boxed{$3200} to buy the seeds."]
48
+
49
+ if ground_truth_solution != '':
50
+ question_wgt = question + '\n\n###\n\nThe reference answer is: ' + ground_truth_solution
51
+ else:
52
+ question_wgt = question + '\n\n###\n\nThe reference answer is: There is no reference answer for this question.'
53
+
54
+ judge_list_infer = []
55
+ with torch.no_grad():
56
+ for step_idx in range(1, len(steps) + 1):
57
+ responses = "\n\n".join(steps[:step_idx]) + "\n\n"
58
+ messages = [
59
+ {"role": "system", "content": "You are a helpful assistant."},
60
+ {"role": "user", "content": question_wgt}
61
+ ]
62
+ query_id = tokenizer.apply_chat_template(
63
+ messages,
64
+ tokenize=True,
65
+ add_generation_prompt=True
66
+ )
67
+ answer_tokens = tokenizer(responses)['input_ids']
68
+ answer_tokens += [tokenizer.eos_token_id]
69
+ QA_ids = query_id + answer_tokens
70
+
71
+ input_ids = torch.tensor([QA_ids]).long().cuda().contiguous()
72
+
73
+ outputs = model(input_ids=input_ids)
74
+ reward = torch.sigmoid(outputs[0]).cpu().item()
75
+ judge_list_infer.append(reward)
76
+
77
+ print(judge_list_infer) # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
78
+
79
+ ```