infly
/

Universal-PRM-7B

Text Generation

Transformers

PyTorch

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

MinghaoYang

ytc0324 commited on 16 days ago

Commit

3b068e5

verified ·

1 Parent(s): d0d996d

Update README.md (#1)

Browse files

- Update README.md (b5b1bf995dba5339b8c9349401e7687aba85e649)

Co-authored-by: tianchuyao <[email protected]>

Files changed (1) hide show

README.md +79 -56

README.md CHANGED Viewed

@@ -1,56 +1,79 @@
----
-license: apache-2.0
----
-Inference Demo
-```python
-from transformers import AutoModel, AutoTokenizer
-import torch
-import json
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-model_path = 'model_path'
-tokenizer = AutoTokenizer.from_pretrained(model_path)
-model = AutoModel.from_pretrained(
-    model_path,
-    device_map=device,
-    torch_dtype=torch.bfloat16,
-    trust_remote_code=True,
-).eval()
-question = "It's April, and Mrs. Rylan has been busy on her farm planting different types of vegetables for the season. She has bought 20 packets of tomato seeds and 80 packets of celery seeds to plant. If a packet of tomato seeds costs $40 and a packet of celery seeds costs $30, how much money did she use to buy the seeds?"
-ground_truth_solution = "The total amount of money she used to buy the tomato seeds is 20 packets * $40/packet = $<<20*40=800>>800\nThe celery seeds cost her 80 packets * $30/packet = $<<80*30=2400>>2400\nFor the seeds, Mrs. Rylan paid $2400 + $800 = $<<2400+800=3200>>3200\n#### 3200"
-steps = ["To find out how much money Mrs. Rylan used to buy the seeds, we need to calculate the total cost of tomato seeds and celery seeds separately, then add them together.", "First, calculate the total cost of tomato seeds. Number of packets of tomato seeds = 20. Cost per packet of tomato seeds = $40. Total cost of tomato seeds = Number of packets of tomato seeds * Cost per packet of tomato seeds = 20 * $40 = $800.", "Second, calculate the total cost of celery seeds. Number of packets of celery seeds = 80. Cost per packet of celery seeds = $30. Total cost of celery seeds = Number of packets of celery seeds * Cost per packet of celery seeds = 80 * $30 = $2400.", "Finally, calculate the total amount of money used to buy the seeds. Total amount of money = Total cost of tomato seeds + Total cost of celery seeds = $800 + $2400 = $3200.", "Therefore, Mrs. Rylan used \\boxed{$3200} to buy the seeds."]
-if ground_truth_solution != '':
-    question_wgt = question + '\n\n###\n\nThe reference answer is: ' + ground_truth_solution
-else:
-    question_wgt = question + '\n\n###\n\nThe reference answer is: There is no reference answer for this question.'
-judge_list_infer = []
-with torch.no_grad():
-    for step_idx in range(1, len(steps) + 1):
-        responses = "\n\n".join(steps[:step_idx]) + "\n\n"
-        messages = [
-                {"role": "system", "content": "You are a helpful assistant."},
-                {"role": "user", "content": question_wgt}
-            ]
-        query_id = tokenizer.apply_chat_template(
-                messages,
-                tokenize=True,
-                add_generation_prompt=True
-            )
-        answer_tokens = tokenizer(responses)['input_ids']
-        answer_tokens += [tokenizer.eos_token_id]
-        QA_ids = query_id + answer_tokens
-        input_ids = torch.tensor([QA_ids]).long().cuda().contiguous()
-        outputs = model(input_ids=input_ids)
-        reward = torch.sigmoid(outputs[0]).cpu().item()
-        judge_list_infer.append(reward)
-print(judge_list_infer)     # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
-```

+---
+license: apache-2.0
+---
+# Universal-PRM-7B
+## 1. Overview
+Universal-PRM is trained using Qwen2.5-Math-7B-Instruct as the base. The training process incorporates diverse policy distributions, ensemble prompting, and reverse verification to enhance generalization and robustness. It achieves state-of-the-art performance on ProcessBench and the internally developed UniversalBench.
+## 2. Experiments
+### ProcessBench
+| Model                         | GSM8K | MATH | Olympiad-Bench | Omni-MATH | Average |
+|-------------------------------|------:|-----:|---------------:|----------:|--------:|
+| Math-Shepherd-PRM-7B          |  47.9 | 29.5 |           24.8 |      23.8 |    31.5 |
+| RLHFflow-PRM-Mistral-8B       |  50.4 | 33.4 |           13.8 |      15.8 |    28.4 |
+| Skywork-PRM-7B                |  70.8 | 53.6 |           22.9 |      21.0 |    42.1 |
+| Qwen2.5-Math-7B-PRM800K       |  68.2 | 62.6 |           50.7 |      44.3 |    56.5 |
+| Qwen2.5-Math-PRM-7B           |  82.4 | 77.6 |           67.5 |      66.3 |    73.5 |
+| **Universal-PRM-7B**          | **85.8** | **77.7** | **67.6** | **66.4** | **74.3** |
+### UniversalBench
+| Model                            | AIME (lng) | AMC (lng) | IMO (lng) | Olympiads (lng) | GSM8K (shrt) | Olympiads (shrt) | MATH (shrt) | Average |
+|----------------------------------|------------|-----------|-----------|----------------|--------------|------------------|-------------|---------|
+| **Math-Shepherd-PRM-7B**        | **60.0**   | 14.1      | 57.6      | 49.3           | 40.8         | 24.3             | 43.9        | 41.4    |
+| **RLHFflow-PRM-Mistral-8B**     | 18.7       | 34.6      | 23.7      | 11.3           | 72.1         | 45.0             | 56.8        | 37.4    |
+| **Skywork-PRM-7B**              | 24.0       | 13.2      | 21.8      | 16.5           | 33.9         | 61.7             | 31.8        | 28.9    |
+| **Qwen2.5-Math-7B-PRM800K**     | 57.1       | 56.8      | **65.4**  | 54.9           | 89.6         | 74.0             | 81.9        | 68.5    |
+| **Qwen2.5-Math-PRM-7B**         | 49.0       | 61.6      | 45.3      | 60.2           | 88.8         | 73.7             | 80.7        | 65.6    |
+| **Universal-PRM-7B**            | 59.5       | **76.2**  | 62.8      | **65.5**       | **91.9**     | **80.2**         | **85.8**    | **74.5** |
+## 3. Quick Start
+```python
+from transformers import AutoModel, AutoTokenizer
+import torch
+import json
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model_path = 'infly/Universal-PRM-7B'
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModel.from_pretrained(
+    model_path,
+    device_map=device,
+    torch_dtype=torch.bfloat16,
+    trust_remote_code=True,
+).eval()
+question = "It's April, and Mrs. Rylan has been busy on her farm planting different types of vegetables for the season. She has bought 20 packets of tomato seeds and 80 packets of celery seeds to plant. If a packet of tomato seeds costs $40 and a packet of celery seeds costs $30, how much money did she use to buy the seeds?"
+ground_truth_solution = "The total amount of money she used to buy the tomato seeds is 20 packets * $40/packet = $<<20*40=800>>800\nThe celery seeds cost her 80 packets * $30/packet = $<<80*30=2400>>2400\nFor the seeds, Mrs. Rylan paid $2400 + $800 = $<<2400+800=3200>>3200\n#### 3200"
+steps = ["To find out how much money Mrs. Rylan used to buy the seeds, we need to calculate the total cost of tomato seeds and celery seeds separately, then add them together.", "First, calculate the total cost of tomato seeds. Number of packets of tomato seeds = 20. Cost per packet of tomato seeds = $40. Total cost of tomato seeds = Number of packets of tomato seeds * Cost per packet of tomato seeds = 20 * $40 = $800.", "Second, calculate the total cost of celery seeds. Number of packets of celery seeds = 80. Cost per packet of celery seeds = $30. Total cost of celery seeds = Number of packets of celery seeds * Cost per packet of celery seeds = 80 * $30 = $2400.", "Finally, calculate the total amount of money used to buy the seeds. Total amount of money = Total cost of tomato seeds + Total cost of celery seeds = $800 + $2400 = $3200.", "Therefore, Mrs. Rylan used \\boxed{$3200} to buy the seeds."]
+if ground_truth_solution != '':
+    question_wgt = question + '\n\n###\n\nThe reference answer is: ' + ground_truth_solution
+else:
+    question_wgt = question + '\n\n###\n\nThe reference answer is: There is no reference answer for this question.'
+judge_list_infer = []
+with torch.no_grad():
+    for step_idx in range(1, len(steps) + 1):
+        responses = "\n\n".join(steps[:step_idx]) + "\n\n"
+        messages = [
+                {"role": "system", "content": "You are a helpful assistant."},
+                {"role": "user", "content": question_wgt}
+            ]
+        query_id = tokenizer.apply_chat_template(
+                messages,
+                tokenize=True,
+                add_generation_prompt=True
+            )
+        answer_tokens = tokenizer(responses)['input_ids']
+        answer_tokens += [tokenizer.eos_token_id]
+        QA_ids = query_id + answer_tokens
+        input_ids = torch.tensor([QA_ids]).long().cuda().contiguous()
+        outputs = model(input_ids=input_ids)
+        reward = torch.sigmoid(outputs[0]).cpu().item()
+        judge_list_infer.append(reward)
+print(judge_list_infer)     # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
+```