PEFT
English
Mathematical Reasoning
File size: 8,649 Bytes
8d42f6a
9672d8e
 
559bbe5
 
 
 
 
 
 
8d42f6a
9672d8e
559bbe5
 
 
 
9672d8e
559bbe5
9672d8e
 
9cd2843
9672d8e
559bbe5
9cd2843
 
 
 
 
 
 
 
559bbe5
9cd2843
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9672d8e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9cd2843
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
559bbe5
9cd2843
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
559bbe5
9cd2843
 
 
 
 
 
 
 
 
 
 
 
 
 
 
559bbe5
 
9cd2843
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e14aa99
 
a2698fb
e14aa99
 
9cd2843
559bbe5
9cd2843
80a8ed0
 
 
 
 
 
9ce3330
 
 
 
 
 
 
 
 
 
 
 
 
 
9cd2843
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
library_name: peft
base_model: mistralai/Mistral-7B-v0.1
license: mit
tags:
- Mathematical Reasoning
datasets:
- akjindal53244/Arithmo-Data
language:
- en
---

**Arithmo2-Mistral-7B** model improves initially released [Arithmo-Mistral-7B](https://huggingface.co./akjindal53244/Arithmo-Mistral-7B) model on both GSM8K and MATH benchmarks. Specifically, there is **absolute** improvement of:
- +1.7% on GSM8K
- +3.0% on GSM8K PoT
- +1.9% on MATH

**This repo contains LoRA adapter weights**. If you are interested in final merged model, use [Arithmo2-Mistral-7B](https://huggingface.co./upaya07/Arithmo2-Mistral-7B) instead.


### Model Description

- **Project GitHub Page:** https://github.com/akjindal53244/Arithmo
- **Developed by:** [Ashvini Kumar Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/)
- **Funded by:** self-work
- **Model type:** fine-tuned using QLoRA on Single GPU
- **Language(s) (NLP):** English
- **Finetuned from model:** mistralai/Mistral-7B-v0.1

## Results

Arithmo2-Mistral-7B is improved version of [Arithmo-Mistral-7B](https://huggingface.co./akjindal53244/Arithmo-Mistral-7B) model and is competitive with full fine-tuned state-of-the-art 7B Mathematical Reasoning models. Refer to [Comparing Arithmo models with other SFT LLM models](https://github.com/akjindal53244/Arithmo/tree/master?tab=readme-ov-file#comparing-arithmo-models-with-other-sft-llm-models) section for more details.

<table>
    <thead>
        <tr>
            <th>Prompt Approach</th>
            <th>GSM8k</th>
            <th>MATH</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Zero-Shot CoT</td>
            <td><b>76.4</b></td>
            <td><b>27.2</b></td>
        </tr>
        <tr>
            <td>Zero-Shot PoT</td>
            <td><b>74.2</b></td>
            <td>-</td>
        </tr>
    </tbody>
</table>

- **Zero-Shot CoT**: On providing a question as prompt, model generates reasoning steps to solve the question along with answer. We check if answer matches with ground-truth.
- **Zero-Shot PoT**: We prompt the model to generate a Python program for the given question. During inference, we execute the Python program generated by the model and check if the program output matches with ground-truth answer.


## Training procedure


The following `bitsandbytes` quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16

### Framework versions


- PEFT 0.6.0.dev0



## Installation

```
pip install transformers >=4.34.0
pip install accelerate
pip install sentencepiece
pip install protobuf

# If you are GPU poor like me
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# If you have a GPU.
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
pip install scipy
pip install bitsandbytes
```


## How to query the model

```
# Set `run_model_on_gpu` to `False` if you are running on CPU. Model will generate reasoning steps with answer for your question. If you want to generate Python program, uncomment line-69 that adds a Python prompt.
# This script automatically does formatting for you, so you just need to type question (eg: `What is 2+2?`) without any prefix like `Question:`, etc.**

$ python query_model.py
```
**Note:** Above script automatically does formatting for you, so you just need to type question (eg: `What is 2+2?`) without any prefix like `Question:`, etc. Checkout [query_model.py](https://github.com/akjindal53244/Arithmo/blob/master/query_model.py) for more details. <br><br>

##### Sample Input:
```
Question: There are total 10 children. I have to give 1 apple to first child, 2 apples to second child, 3 apples to third child, and so on. How many apples do I need?
```
##### Model Output:
```
Answer: The total number of apples needed is the sum of the first 10 positive integers.
This can be calculated using the formula for the sum of an arithmetic series:
\[S = \frac{n}{2}(a_1 + a_n),\]
where $S$ is the sum, $n$ is the number of terms, $a_1$ is the first term, and $a_n$ is the last term.
In this case, $n = 10$, $a_1 = 1$, and $a_n = 10$.
Plugging these values into the formula, we get:
\[S = \frac{10}{2}(1 + 10) = 5(11) = \boxed{55}.\]
The answer is: 55
```

Arithmo2-Mistral-7B is trained with same format as [Arithmo-Mistral-7B](https://huggingface.co./akjindal53244/Arithmo-Mistral-7B):
#### CoT Format (generate reasoning steps with answer):
```
Question: <question>

Answer:
```

#### PoT Format (generate a python program):
```
Question: <question> <python_prompt>

Answer:
```
It will perform best if queried in this way with your own script.

## Comparing Arithmo models with other SFT LLM models
Results for all models except `Arithmo2-Mistral-7B` are taken from [MetaMath](https://github.com/meta-math/MetaMath/blob/main/README.MD) repository.

| Model               | GSM8k Pass@1 | MATH Pass@1 | Fine-tuning |
|---------------------|--------------|-------------|-------------|
| MPT-7B              | 6.8          | 3.0         |
| Falcon-7B           | 6.8          | 2.3         |
| LLaMA-1-7B          | 11.0         | 2.9         |
| LLaMA-2-7B          | 14.6         | 2.5         |
| MPT-30B             | 15.2         | 3.1         |
| LLaMA-1-13B         | 17.8         | 3.9         |
| GPT-Neo-2.7B        | 19.5         | --          |
| Falcon-40B          | 19.6         | 2.5         |
| Baichuan-chat-13B   | 23.9         | --          |
| Vicuna-v1.3-13B     | 27.6         | --          |
| LLaMA-2-13B         | 28.7         | 3.9         |
| InternLM-7B         | 31.2         | --          |
| ChatGLM-2-6B        | 32.4         | --          |
| GPT-J-6B            | 34.9         | --          |
| LLaMA-1-33B         | 35.6         | 3.9         |
| LLaMA-2-34B         | 42.2         | 6.24        |
| RFT-7B              | 50.3         | --          |
| LLaMA-1-65B         | 50.9         | 10.6        |
| Qwen-7B             | 51.6         | --          |
| WizardMath-7B       | 54.9         | 10.7        |
| LLaMA-2-70B         | 56.8         | 13.5        |
| WizardMath-13B      | 63.9         | 14.0        |
| MetaMath-7B         | 66.5         | 19.8        |
| MetaMath-13B        | 72.3         | 22.4        |
| Arithmo-Mistral-7B (PoT)  | 71.2 | --       | SFT: 4-bit QLoRA |
| Arithmo2-Mistral-7B (PoT)  | 74.2 | --       | SFT: 4-bit QLoRA |
| MetaMath-Mistral-7B  | 77.7 | 28.2       | SFT: Full fine-tuned |
| Arithmo-Mistral-7B| 74.7 | 25.3       | SFT: 4-bit QLoRA |
| 🔥 **Arithmo2-Mistral-7B**  | **76.4** | **27.2**       | **SFT: 4-bit QLoRA** |

If you are interested in reproducing the resullts, visit https://github.com/akjindal53244/Arithmo#reproducing-results section.

### Support My Work

Building LLMs takes time and resources; if you find my work interesting, your support would be epic!
<a href="https://www.buymeacoffee.com/a_little_learner" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>


### Citation
To cite Arithmo models:
```
@misc{jindal_2023_arithmo,
  author = {Jindal, Ashvini},
  title = {Arithmo-Mistral-7B: Mathematical Reasoning Model},
  howpublished = {Hugging Face},
  month = {October},
  year = {2023},
  url = {https://huggingface.co./akjindal53244/Arithmo-Mistral-7B}
}
```


<h2 id="References">References</h2>

```
@article{yu2023metamath,
  title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
  author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
  journal={arXiv preprint arXiv:2309.12284},
  year={2023}
}

@article{Yue2023mammoth,
  title={MAmmoTH: Building math generalist models through hybrid instruction tuning},
  author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, and Wenhu Chen},
  journal={arXiv preprint arXiv:2309.05653},
  year={2023}
}

@article{mishra2022lila,
  title={Lila: A unified benchmark for mathematical reasoning},
  author={Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, and Ashwin Kalyan},
  journal={arXiv preprint arXiv:2210.17517},
  year={2022}
}

```