Rank,Algorithm,LLM,Eval Date,Avg Score,gsm8k-Score,gsm8k-Cost($),AQuA-Score,AQuA-Cost($) 1.0,CoT,Qwen2.5-72B-Instruct,2025/1/22,89.55,92.87,0.7195,86.22,0.0808 2.0,SC-CoT,Qwen2.5-72B-Instruct,2025/1/22,89.45,93.86,5.9858,85.04,1.0348 3.0,CoT,Llama-3.3-70B-Instruct,2025/1/22,88.70,93.93,0.687,83.46,0.0927 4.0,SC-CoT,Llama-3.3-70B-Instruct,2025/1/22,88.68,95.07,6.2005,82.28,1.0756 5.0,SC-CoT,gpt-4o,2025/1/22,88.46,90.3,31.0542,86.61,8.1485 6.0,CoT,gpt-4o,2025/1/22,88.39,94.09,4.5367,82.68,1.0417 7.0,IO,Llama-3.3-70B-Instruct,2025/1/22,87.48,92.27,0.4709,82.68,0.0798 8.0,CoT,Doubao-lite-32k,2025/1/7,86.00,89.31,0.0558,82.68,0.0066 9.0,SC-CoT,Qwen2.5-7B-Instruct,2025/1/22,85.53,91.13,0.0,79.92,0.0 10.0,IO,Qwen2.5-72B-Instruct,2025/1/22,85.42,86.58,0.4899,84.25,0.0742 11.0,SC-CoT,Doubao-lite-32k,2025/1/7,84.18,87.26,0.2083,81.1,0.0519 12.0,PoT,gpt-4o,2025/1/22,84.15,93.1,4.2166,75.2,1.6087 13.0,PoT,Qwen2.5-72B-Instruct,2025/1/22,83.77,92.34,0.7054,75.2,0.1645 14.0,ReAct-Pro*,Llama-3.3-70B-Instruct,2025/1/22,83.39,87.64,10.1124,79.13,0.768 15.0,CoT,Qwen2.5-7B-Instruct,2025/1/22,83.19,85.67,0.0,80.71,0.0 16.0,IO,gpt-4o,2025/1/22,82.00,88.4,3.3463,75.59,1.1453 17.0,ReAct-Pro*,Doubao-lite-32k,2025/1/7,81.58,85.6,0.2512,77.56,0.0445 18.0,ReAct-Pro*,Qwen2.5-72B-Instruct,2025/1/22,80.25,87.26,10.5479,73.23,0.3177 19.0,ReAct-Pro*,Qwen2.5-7B-Instruct,2025/1/22,78.64,82.87,0.0,74.41,0.0 20.0,PoT,Llama-3.3-70B-Instruct,2025/1/22,76.31,73.09,0.9736,79.53,0.1746 21.0,PoT,Doubao-lite-32k,2025/1/7,75.63,79.61,0.0576,71.65,0.0147 22.0,IO,Doubao-lite-32k,2025/1/7,75.58,72.02,0.0354,79.13,0.0058 23.0,SC-CoT,gpt-3.5-turbo,2025/1/7,73.03,79.91,3.3938,66.14,0.7888 24.0,CoT,gpt-3.5-turbo,2025/1/7,69.86,78.7,0.6788,61.02,0.0957 25.0,ReAct-Pro*,gpt-3.5-turbo,2025/1/7,69.74,74.91,3.4633,64.57,0.4928 26.0,PoT,gpt-3.5-turbo,2025/1/7,68.17,76.88,0.6902,59.45,0.1748 27.0,CoT,Llama-3.1-8B-Instruct,2025/1/22,68.04,75.44,0.0,60.63,0.0 28.0,IO,Qwen2.5-7B-Instruct,2025/1/22,67.99,57.24,0.0,78.74,0.0 29.0,SC-CoT,Llama-3.1-8B-Instruct,2025/1/22,66.46,73.46,0.0,59.45,0.0 30.0,CoT,Internllm2_5-7B,2025/1/22,65.24,77.71,0.0,52.76,0.0 31.0,PoT,Qwen2.5-7B-Instruct,2025/1/22,63.47,58.83,0.0,68.11,0.0 32.0,ReAct-Pro*,Llama-3.1-8B-Instruct,2025/1/22,61.65,67.78,0.0,55.51,0.0 33.0,ReAct-Pro*,gpt-4o,2025/1/22,60.40,63.31,39.0751,57.48,2.304 34.0,IO,Llama-3.1-8B-Instruct,2025/1/22,54.17,57.16,0.0,51.18,0.0 35.0,CoT,Qwen2-1.5B-Instruct,2025/1/22,48.03,55.5,0.0,40.55,0.0 36.0,SC-CoT,Internllm2_5-7B,2025/1/22,43.80,48.22,0.0,39.37,0.0 37.0,IO,gpt-3.5-turbo,2025/1/7,38.41,37.83,0.3328,38.98,0.038 38.0,PoT,Llama-3.1-8B-Instruct,2025/1/22,37.64,38.67,0.0,36.61,0.0 39.0,PoT,Internllm2_5-7B,2025/1/22,37.41,38.21,0.0,36.61,0.0 40.0,ReAct-Pro*,Internllm2_5-7B,2025/1/22,37.23,33.51,0.0,40.94,0.0 41.0,CoT,Qwen2-0.5B-Instruct,2025/1/22,34.51,35.94,0.0,33.07,0.0 42.0,IO,Internllm2_5-7B,2025/1/22,29.62,11.6,0.0,47.64,0.0 43.0,ReAct-Pro*,Qwen2-1.5B-Instruct,2025/1/22,25.23,24.87,0.0,25.59,0.0 44.0,PoT,Qwen2-1.5B-Instruct,2025/1/22,24.61,18.5,0.0,30.71,0.0 45.0,IO,Qwen2-1.5B-Instruct,2025/1/22,22.91,16.68,0.0,29.13,0.0 46.0,IO,Qwen2-0.5B-Instruct,2025/1/22,20.94,14.71,0.0,27.17,0.0 47.0,SC-CoT,Qwen2-1.5B-Instruct,2025/1/22,17.69,11.75,0.0,23.62,0.0 48.0,ReAct-Pro*,Qwen2-0.5B-Instruct,2025/1/22,15.84,7.66,0.0,24.02,0.0 49.0,PoT,Qwen2-0.5B-Instruct,2025/1/22,13.47,9.62,0.0,17.32,0.0 50.0,SC-CoT,Qwen2-0.5B-Instruct,2025/1/22,12.25,1.67,0.0,22.83,0.0