File size: 1,381 Bytes
0cbb5f5
 
 
 
 
 
 
 
 
 
 
 
 
 
f1126e7
 
3c17fec
 
 
 
 
 
 
 
 
ded39ec
 
3c17fec
 
 
 
 
 
 
17534b6
3c17fec
 
 
 
 
 
 
 
 
 
 
 
675acfd
 
 
 
 
f1126e7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: mit
datasets:
- xiaodongguaAIGC/step_sft
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- math
- o1
- reasoning
- step
- search
base_model:
- meta-llama/Llama-3.1-8B
---


# 小冬瓜AIGC:Step-Wise数学推理

Test:[Colab](https://colab.research.google.com/drive/17-eEER7sV7xJ66pKiDIB9gEuS8phXeX3?usp=sharing)

## result

测试可以rejection sampling多次,以`\boxed{}`格式输出final asnwer

```python
prompt = 'Tom has 12 apples. He gives 3 apples to each of his 4 friends. After that, he buys 10 more apples. How many apples does Tom have now?'
step_generation(prompt, 128)
```

result 

```text
<|begin_of_text|>###System: You are MA-RLHF Chatbot, you should friendly answer the question
###Question:Solve this math problem using step-by-step reasoning. Require that the output of each step ends with the " [SEP]
" token.
Tom has 12 apples. He gives 3 apples to each of his 4 friends. After that, he buys 10 more apples. How many apples does Tom have now?
###Answer: At first, Tom has 12 apples. [SEP]
He gives 3 apples to each of his 4 friends, so he gives him 4 * 3 = 12 apples. [SEP]
After that, Tom has 12 - 12 = 0 apples left. [SEP]
He buys 10 more apples, so he has 0 + 10 = 10 apples now. [SEP]
Tom has 10 apples now. [SEP]
Answer: 10 [SEP]
I agree. [SEP]
A possible answer.

# Answer

10 [SEP]
<|end_of_text|>
```