File size: 8,139 Bytes
dfd2164 fc46879 625d5df dfd2164 fc46879 625d5df 5d2bf3f ee2ced6 625d5df ee2ced6 fc46879 5d2bf3f fc46879 5d2bf3f fc46879 5d2bf3f fc46879 625d5df fc46879 5d2bf3f fc46879 9bb94cf 681420b 9bb94cf fc46879 8885d2e fc46879 8885d2e fc46879 8885d2e eeeb770 fc46879 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
---
license: mit
language:
- en
datasets:
- akjindal53244/Arithmo-Data
tags:
- Mathematical Reasoning
---
**Arithmo2-Mistral-7B** model improves initially released [Arithmo-Mistral-7B](https://huggingface.co./akjindal53244/Arithmo-Mistral-7B) model on both GSM8K and MATH benchmarks. Specifically, there is **absolute** improvement of:
- +1.7% on GSM8K
- +3.0% on GSM8K PoT
- +1.9% on MATH
**This repo contains final merged model**. If you are interested in LoRA adapter, use [LoRA Adapter](https://huggingface.co./upaya07/Arithmo2-Mistral-7B-adapter) instead.
### Model Description
- **Project GitHub Page:** https://github.com/akjindal53244/Arithmo
- **Developed by:** [Ashvini Kumar Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/)
- **Funded by:** self-work
- **Model type:** fine-tuned using QLoRA on Single GPU
- **Language(s) (NLP):** English
- **Finetuned from model:** mistralai/Mistral-7B-v0.1
## Results
Arithmo2-Mistral-7B is improved version of [Arithmo-Mistral-7B](https://huggingface.co./akjindal53244/Arithmo-Mistral-7B) model and is competitive with full fine-tuned state-of-the-art 7B Mathematical Reasoning models. Refer to [Comparing Arithmo models with other SFT LLM models](https://github.com/akjindal53244/Arithmo/tree/master?tab=readme-ov-file#comparing-arithmo-models-with-other-sft-llm-models) section for more details.
<table>
<thead>
<tr>
<th>Prompt Approach</th>
<th>GSM8k</th>
<th>MATH</th>
</tr>
</thead>
<tbody>
<tr>
<td>Zero-Shot CoT</td>
<td><b>76.4</b></td>
<td><b>27.2</b></td>
</tr>
<tr>
<td>Zero-Shot PoT</td>
<td><b>74.2</b></td>
<td>-</td>
</tr>
</tbody>
</table>
- **Zero-Shot CoT**: On providing a question as prompt, model generates reasoning steps to solve the question along with answer. We check if answer matches with ground-truth.
- **Zero-Shot PoT**: We prompt the model to generate a Python program for the given question. During inference, we execute the Python program generated by the model and check if the program output matches with ground-truth answer.
## Installation
```
pip install transformers >=4.34.0
pip install accelerate
pip install sentencepiece
pip install protobuf
# If you are GPU poor like me
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# If you have a GPU.
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
pip install scipy
pip install bitsandbytes
```
## How to query the model
```
# Set `run_model_on_gpu` to `False` if you are running on CPU. Model will generate reasoning steps with answer for your question. If you want to generate Python program, uncomment line-69 that adds a Python prompt.
# This script automatically does formatting for you, so you just need to type question (eg: `What is 2+2?`) without any prefix like `Question:`, etc.**
$ python query_model.py
```
**Note:** Above script automatically does formatting for you, so you just need to type question (eg: `What is 2+2?`) without any prefix like `Question:`, etc. Checkout [query_model.py](https://github.com/akjindal53244/Arithmo/blob/master/query_model.py) for more details. <br><br>
##### Sample Input:
```
Question: There are total 10 children. I have to give 1 apple to first child, 2 apples to second child, 3 apples to third child, and so on. How many apples do I need?
```
##### Model Output:
```
Answer: The total number of apples needed is the sum of the first 10 positive integers.
This can be calculated using the formula for the sum of an arithmetic series:
\[S = \frac{n}{2}(a_1 + a_n),\]
where $S$ is the sum, $n$ is the number of terms, $a_1$ is the first term, and $a_n$ is the last term.
In this case, $n = 10$, $a_1 = 1$, and $a_n = 10$.
Plugging these values into the formula, we get:
\[S = \frac{10}{2}(1 + 10) = 5(11) = \boxed{55}.\]
The answer is: 55
```
Arithmo2-Mistral-7B is trained with same format as [Arithmo-Mistral-7B](https://huggingface.co./akjindal53244/Arithmo-Mistral-7B):
#### CoT Format (generate reasoning steps with answer):
```
Question: <question>
Answer:
```
#### PoT Format (generate a python program):
```
Question: <question> <python_prompt>
Answer:
```
It will perform best if queried in this way with your own script.
## Comparing Arithmo models with other SFT LLM models
Results for all models except `Arithmo2-Mistral-7B` are taken from [MetaMath](https://github.com/meta-math/MetaMath/blob/main/README.MD) repository.
| Model | GSM8k Pass@1 | MATH Pass@1 | Fine-tuning |
|---------------------|--------------|-------------|-------------|
| MPT-7B | 6.8 | 3.0 |
| Falcon-7B | 6.8 | 2.3 |
| LLaMA-1-7B | 11.0 | 2.9 |
| LLaMA-2-7B | 14.6 | 2.5 |
| MPT-30B | 15.2 | 3.1 |
| LLaMA-1-13B | 17.8 | 3.9 |
| GPT-Neo-2.7B | 19.5 | -- |
| Falcon-40B | 19.6 | 2.5 |
| Baichuan-chat-13B | 23.9 | -- |
| Vicuna-v1.3-13B | 27.6 | -- |
| LLaMA-2-13B | 28.7 | 3.9 |
| InternLM-7B | 31.2 | -- |
| ChatGLM-2-6B | 32.4 | -- |
| GPT-J-6B | 34.9 | -- |
| LLaMA-1-33B | 35.6 | 3.9 |
| LLaMA-2-34B | 42.2 | 6.24 |
| RFT-7B | 50.3 | -- |
| LLaMA-1-65B | 50.9 | 10.6 |
| Qwen-7B | 51.6 | -- |
| WizardMath-7B | 54.9 | 10.7 |
| LLaMA-2-70B | 56.8 | 13.5 |
| WizardMath-13B | 63.9 | 14.0 |
| MetaMath-7B | 66.5 | 19.8 |
| MetaMath-13B | 72.3 | 22.4 |
| Arithmo-Mistral-7B (PoT) | 71.2 | -- | SFT: 4-bit QLoRA |
| Arithmo2-Mistral-7B (PoT) | 74.2 | -- | SFT: 4-bit QLoRA |
| MetaMath-Mistral-7B | 77.7 | 28.2 | SFT: Full fine-tuned |
| Arithmo-Mistral-7B| 74.7 | 25.3 | SFT: 4-bit QLoRA |
| 🔥 **Arithmo2-Mistral-7B** | **76.4** | **27.2** | **SFT: 4-bit QLoRA** |
If you are interested in reproducing the results, visit https://github.com/akjindal53244/Arithmo#reproducing-results section.
### Support My Work
Building LLMs takes time and resources; if you find my work interesting, your support would be epic!
<a href="https://www.buymeacoffee.com/a_little_learner" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
### Citation
To cite Arithmo models:
```
@misc{jindal_2023_arithmo,
author = {Jindal, Ashvini},
title = {Arithmo-Mistral-7B: Mathematical Reasoning Model},
howpublished = {Hugging Face},
month = {October},
year = {2023},
url = {https://huggingface.co./akjindal53244/Arithmo-Mistral-7B}
}
```
<h2 id="References">References</h2>
```
@article{yu2023metamath,
title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
journal={arXiv preprint arXiv:2309.12284},
year={2023}
}
@article{Yue2023mammoth,
title={MAmmoTH: Building math generalist models through hybrid instruction tuning},
author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, and Wenhu Chen},
journal={arXiv preprint arXiv:2309.05653},
year={2023}
}
@article{mishra2022lila,
title={Lila: A unified benchmark for mathematical reasoning},
author={Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, and Ashwin Kalyan},
journal={arXiv preprint arXiv:2210.17517},
year={2022}
}
``` |