File size: 7,593 Bytes
3a9542d da31b11 3a9542d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
---
license: apache-2.0
tags:
- llama-cpp
- gguf-my-repo
base_model: FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview
---
# Triangle104/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview-Q3_K_S-GGUF
This model was converted to GGUF format from [`FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview`](https://huggingface.co./FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co./spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co./FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) for more details on the model.
---
Overview
-
FuseO1-Preview
is our initial endeavor to enhance the System-II reasoning capabilities
of large language models (LLMs) through innovative model fusion
techniques. By employing our advanced SCE
merging methodologies, we integrate multiple open-source o1-like LLMs
into a unified model. Our goal is to incorporate the distinct knowledge
and strengths from different reasoning LLMs into a single, unified model
with strong System-II reasoning abilities, particularly in mathematics,
coding, and science domains.
o achieve this, we conduct two types of model merging:
Long-Long Reasoning Merging: This approach involves
model fusion across LLMs that utilize long-CoT reasoning, with the goal
of enhancing long-CoT reasoning capabilities. The resulted FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview achieves a Pass@1 accuracy of 74.0 on AIME24,
demonstrating significant performance improvements compared to the
OpenAI o1-preview (44.6) and OpenAI o1-mini (63.4), even approaching
OpenAI o1 (79.2).
Long-Short Reasoning Merging: This approach
involves model fusion between long-CoT and short-CoT LLMs, aiming to
improve reasoning capabilities in both long and short reasoning
processes. The resulted FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview and FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
is capable of utilizing both long and short reasoning processes and
demonstrates relatively strong performance in long reasoning tasks.
Long-Long Reasoning Merging
We conduct experiments on these folloing long-cot LLMs.
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Qwen/QwQ-32B-Preview
NovaSky-AI/Sky-T1-32B-Preview
To reproduce the merged FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview model, using the script below.
cd FuseAI/FuseO1-Preview/mergekit
pip3 install -e .
model_save_dir=xx # your path to save the merged models
mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview --cudas
To reproduce the merged FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview model, using the script below.
cd FuseAI/FuseO1-Preview/mergekit
pip3 install -e .
model_save_dir=xxx # your path to save the merged models
mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-QwQ-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-QwQ-32B-Preview --cuda
We provide the example code to use FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview.
from vllm import LLM, SamplingParams
llm = LLM(model="FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview", tensor_parallel_size=8)
sampling_params = SamplingParams(max_tokens=32768, temperature=0.7, stop=["<|im_end|>", "<|end▁of▁sentence|>"], stop_token_ids=[151645, 151643])
conversations = [
[
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{{}}."},
{"role": "user", "content": "Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$."},
],
]
responses = llm.chat(messages=conversations, sampling_params=sampling_params, use_tqdm=True)
for response in responses:
print(response.outputs[0].text.strip())
Long-Short Reasoning Merging
We conduct experiments on these folloing long-cot and short-cot LLMs.
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Qwen/Qwen2.5-32B-Instruct
Qwen/Qwen2.5-32B-Coder
To reproduce the merged FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview model, using the script below.
cd FuseAI/FuseO1-Preview/mergekit
pip3 install -e .
model_save_dir=xxx # your path to save the merged models
mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview --cuda
To reproduce the merged FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview model, using the script below.
cd FuseAI/FuseO1-Preview/mergekit
pip3 install -e .
model_save_dir=xxx # your path to save the merged models
mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview --cuda
To reproduce the merged FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview model, using the script below.
cd FuseAI/FuseO1-Preview/mergekit
pip3 install -e .
model_save_dir=xxx # your path to save the merged models
mergekit-yaml fuseo1_configs/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview.yaml ${model_save_dir}/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview --cuda
We provide the code to use FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview.
from vllm import LLM, SamplingParams
llm = LLM(model="FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview", tensor_parallel_size=8)
sampling_params = SamplingParams(max_tokens=32768, temperature=0.7, stop=["<|im_end|>", "<|end▁of▁sentence|>"], stop_token_ids=[151645, 151643])
conversations = [
[
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{{}}."},
{"role": "user", "content": "Quadratic polynomials $P(x)$ and $Q(x)$ have leading coefficients $2$ and $-2,$ respectively. The graphs of both polynomials pass through the two points $(16,54)$ and $(20,53).$ Find $P(0) + Q(0).$."},
],
]
responses = llm.chat(messages=conversations, sampling_params=sampling_params, use_tqdm=True)
for response in responses:
print(response.outputs[0].text.strip())
---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
```bash
brew install llama.cpp
```
Invoke the llama.cpp server or the CLI.
### CLI:
```bash
llama-cli --hf-repo Triangle104/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview-Q3_K_S-GGUF --hf-file fuseo1-deepseekr1-qwq-skyt1-32b-preview-q3_k_s.gguf -p "The meaning to life and the universe is"
```
### Server:
```bash
llama-server --hf-repo Triangle104/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview-Q3_K_S-GGUF --hf-file fuseo1-deepseekr1-qwq-skyt1-32b-preview-q3_k_s.gguf -c 2048
```
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```
Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview-Q3_K_S-GGUF --hf-file fuseo1-deepseekr1-qwq-skyt1-32b-preview-q3_k_s.gguf -p "The meaning to life and the universe is"
```
or
```
./llama-server --hf-repo Triangle104/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview-Q3_K_S-GGUF --hf-file fuseo1-deepseekr1-qwq-skyt1-32b-preview-q3_k_s.gguf -c 2048
```
|