|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# llm-jp-13b-OpenWebMathInstruct_2_v1.1 |
|
|
|
開発停止 |
|
数学タスクにおける精度向上が見込めなかったため |
|
## Overview |
|
This model is an instruction-tuned variant of [llm-jp/llm-jp-3-13b-instruct](https://huggingface.co./llm-jp/llm-jp-3-13b-instruct), further fine-tuned on a subset of [nvidia/OpenMathInstruct-2](https://huggingface.co./datasets/nvidia/OpenMathInstruct-2) with 256,000 samples. The fine-tuning process followed a parameter-efficient strategy, updating only selected layers while freezing most of the model parameters. |
|
|
|
## Key Features |
|
- **Base Model**: [llm-jp/llm-jp-3-13b-instruct](https://huggingface.co./llm-jp/llm-jp-3-13b-instruct) |
|
- **Fine-Tuning Data**: 256,000 samples from [nvidia/OpenMathInstruct-2](https://huggingface.co./datasets/nvidia/OpenMathInstruct-2) |
|
- **Updated Parameters**: |
|
- All parameters were frozen except for: |
|
```python |
|
for param in model.parameters(): |
|
param.requires_grad = False |
|
|
|
for param in model.lm_head.parameters(): |
|
param.requires_grad = True |
|
``` |
|
- **Macro-o1 Tokens Added**: To align with [Marco-o1](https://arxiv.org/pdf/2411.14405v1), we introduced the following special tokens: |
|
- `<Thought>`, `</Thought>` |
|
- `<Output>`, `</Output>` |
|
- **Reasoning Model Integration**: Uses the implementation from [Hajime-Y/reasoning-model](https://github.com/Hajime-Y/reasoning-model) |
|
|
|
## Usage |
|
Below is an example of using the model with Monte Carlo Tree Search (MCTS) for reasoning: |
|
|
|
```python |
|
import sys |
|
import torch |
|
sys.path.append('./reasoning-model') |
|
from reasoning_model import ReasoningModelForCausalLM |
|
from tree_utils import print_tree_with_best_path |
|
from transformers import AutoTokenizer |
|
|
|
# tokenizerとmodelの準備 |
|
model_name = "doshisha-mil/llm-jp-13b-OpenMathInstruct-2-v1.1" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
model_name, |
|
trust_remote_code=True, |
|
) |
|
|
|
# パディングトークンを明示的に設定 |
|
if tokenizer.pad_token is None: |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
# モデルのロード |
|
model = ReasoningModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype="auto", |
|
device_map="auto" |
|
) |
|
|
|
# 入力テキスト |
|
prompt = "Find the number of positive integers $x$ that satisfy $x^{-1}>x$." |
|
text = f"あなたは優秀で論理的なアシスタントです。まずは<Thought></Thought>タグの中であなたの思考の過程を記載し、<Output></Output>タグの中に最終的にユーザーに提供する出力を記載します。\n\n### 指示: {prompt}\n\n### 応答: <Thought>\n" |
|
|
|
# Tokenize with explicit attention_mask |
|
model_inputs = tokenizer([text], return_tensors="pt", padding=True, truncation=True) |
|
model_inputs["attention_mask"] = (model_inputs["input_ids"] != tokenizer.pad_token_id).long() |
|
|
|
# デバイスをモデルのデバイスに統一 |
|
model_inputs = {key: val.to(model.device) for key, val in model_inputs.items()} |
|
|
|
# MCTSを用いて生成 |
|
final_tokens, final_node = model.generate( |
|
input_ids=model_inputs["input_ids"], |
|
attention_mask=model_inputs["attention_mask"], # 明示的に attention_mask を渡す |
|
iterations_per_step=3, |
|
max_iterations=30, |
|
mini_step_size=32, |
|
expand_threshold=0, |
|
step_separator_ids=None, |
|
) |
|
|
|
# 結果をテキスト化 |
|
final_text = tokenizer.decode(final_tokens, skip_special_tokens=True) |
|
print("=== 最終生成テキスト ===") |
|
print(final_text) |
|
|
|
``` |
|
|
|
## Model Applications |
|
- Mathematical problem-solving with structured reasoning |
|
- Chain-of-Thought (CoT) enhanced reasoning |
|
- Integration with Monte Carlo Tree Search (MCTS) |
|
- Instruction-based question answering |
|
|
|
## References |
|
- **Base Model**: [llm-jp/llm-jp-3-13b-instruct](https://huggingface.co./llm-jp/llm-jp-3-13b-instruct) |
|
- **Dataset**: [nvidia/OpenMathInstruct-2](https://huggingface.co./datasets/nvidia/OpenMathInstruct-2) |
|
- **Marco-o1 Paper**: [arXiv:2411.14405v1](https://arxiv.org/pdf/2411.14405v1) |
|
- **Reasoning Model Code**: [Hajime-Y/reasoning-model](https://github.com/Hajime-Y/reasoning-model) |
|
|
|
## Citation |
|
If you use this model, please cite the original base model and relevant datasets. |
|
|
|
```bibtex |
|
@article{llm-jp3-13b-instruct, |
|
title={LLM-JP 3-13B Instruct}, |
|
author={LLM-JP Team}, |
|
year={2024}, |
|
journal={Hugging Face Repository}, |
|
url={https://huggingface.co./llm-jp/llm-jp-3-13b-instruct} |
|
} |
|
|
|
@article{marco-o1, |
|
title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions}, |
|
author={Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang}, |
|
year={2024}, |
|
journal={arXiv}, |
|
eprint={2411.14405v1}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
## License |
|
Refer to the base model's license at [llm-jp/llm-jp-3-13b-instruct](https://huggingface.co./llm-jp/llm-jp-3-13b-instruct) for details. |
|
|
|
--- |
|
|
|
This README provides clear documentation on how to use the model while crediting its sources. Let me know if you need modifications! |
|
|
|
|