|
--- |
|
license: mit |
|
datasets: |
|
- xiaodongguaAIGC/step_sft |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- math |
|
- o1 |
|
- reasoning |
|
- step |
|
- search |
|
base_model: |
|
- meta-llama/Llama-3.1-8B |
|
--- |
|
|
|
|
|
# 小冬瓜AIGC:Step-Wise数学推理 |
|
|
|
Test:[Colab](https://colab.research.google.com/drive/17-eEER7sV7xJ66pKiDIB9gEuS8phXeX3?usp=sharing) |
|
|
|
## result |
|
|
|
测试可以rejection sampling多次,以`\boxed{}`格式输出final asnwer |
|
|
|
```python |
|
prompt = 'Tom has 12 apples. He gives 3 apples to each of his 4 friends. After that, he buys 10 more apples. How many apples does Tom have now?' |
|
step_generation(prompt, 128) |
|
``` |
|
|
|
result |
|
|
|
```text |
|
<|begin_of_text|>###System: You are MA-RLHF Chatbot, you should friendly answer the question |
|
###Question:Solve this math problem using step-by-step reasoning. Require that the output of each step ends with the " [SEP] |
|
" token. |
|
Tom has 12 apples. He gives 3 apples to each of his 4 friends. After that, he buys 10 more apples. How many apples does Tom have now? |
|
###Answer: At first, Tom has 12 apples. [SEP] |
|
He gives 3 apples to each of his 4 friends, so he gives him 4 * 3 = 12 apples. [SEP] |
|
After that, Tom has 12 - 12 = 0 apples left. [SEP] |
|
He buys 10 more apples, so he has 0 + 10 = 10 apples now. [SEP] |
|
Tom has 10 apples now. [SEP] |
|
Answer: 10 [SEP] |
|
I agree. [SEP] |
|
A possible answer. |
|
|
|
# Answer |
|
|
|
10 [SEP] |
|
<|end_of_text|> |
|
``` |