--- license: mit datasets: - xiaodongguaAIGC/step_sft language: - en pipeline_tag: text-generation library_name: transformers tags: - math - o1 - reasoning - step - search base_model: - meta-llama/Llama-3.1-8B --- # 小冬瓜AIGC:Step-Wise数学推理 Test:[Colab](https://colab.research.google.com/drive/17-eEER7sV7xJ66pKiDIB9gEuS8phXeX3?usp=sharing) ## result 测试可以rejection sampling多次,以`\boxed{}`格式输出final asnwer ```python prompt = 'Tom has 12 apples. He gives 3 apples to each of his 4 friends. After that, he buys 10 more apples. How many apples does Tom have now?' step_generation(prompt, 128) ``` result ```text <|begin_of_text|>###System: You are MA-RLHF Chatbot, you should friendly answer the question ###Question:Solve this math problem using step-by-step reasoning. Require that the output of each step ends with the " [SEP] " token. Tom has 12 apples. He gives 3 apples to each of his 4 friends. After that, he buys 10 more apples. How many apples does Tom have now? ###Answer: At first, Tom has 12 apples. [SEP] He gives 3 apples to each of his 4 friends, so he gives him 4 * 3 = 12 apples. [SEP] After that, Tom has 12 - 12 = 0 apples left. [SEP] He buys 10 more apples, so he has 0 + 10 = 10 apples now. [SEP] Tom has 10 apples now. [SEP] Answer: 10 [SEP] I agree. [SEP] A possible answer. # Answer 10 [SEP] <|end_of_text|> ```