はじめに
これは,東京大学松尾・岩澤研究室のLLM講座2024のコンペティションで提出するためのモデルです.
llm-jp/llm-jp-3-13bに,QLoRAによるSFTを施して,LoRAアダプタのみをこちらにアップしています.
chat templateは,weblab-GENIAC/Tanuki-8B-dpo-v1.0のものと同一のものを使用しています.
推論方法
提供された環境で,以下のように推論します.L4 GPU×1のインスタンスで,vLLMを用いて推論します.
Jupyter Notebookで,一かたまりごとに一つのセルになっています.順番に実行してください.
!pip uninstall numpy -y
!pip install numpy==1.26.4
%%time
%pip install vllm==0.6.4.post1 --force-reinstall
!pip install ipywidgets
import time
import torch
import transformers
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
BitsAndBytesConfig
)
import vllm
from vllm.lora.request import LoRARequest
from jinja2 import Template
print(vllm.__version__)
MAX_LENGTH = 1024
MODEL_NAME = "llm-jp/llm-jp-3-13b"
print(MODEL_NAME)
import os
os.environ["HF_TOKEN"] = "あなたのHugging Faceトークン"
from vllm.lora.request import LoRARequest
llm = vllm.LLM(
MODEL_NAME,
tensor_parallel_size=1,
gpu_memory_utilization=0.95,
trust_remote_code=True,
enforce_eager=True,
max_model_len=MAX_LENGTH,
enable_lora=True,
quantization="bitsandbytes",
load_format="bitsandbytes"
)
tokenizer = llm.get_tokenizer()
from transformers import AutoTokenizer
sft_tokenizer = AutoTokenizer.from_pretrained(
"weblab-GENIAC/Tanuki-8B-dpo-v1.0"
)
tokenizer.chat_template = sft_tokenizer.chat_template
from huggingface_hub import snapshot_download
lora_path = snapshot_download(repo_id="OsakanaTeishoku/1204lora")
from datasets import load_dataset
data_files = {"test": "elyza-tasks-100-TV_0.jsonl"}
tasks = load_dataset("json", data_files=data_files, split="test")
messages_list = [
[{"role": "user", "content": tasks["input"][i]}] for i in range(len(tasks))
]
prompts = [line[0]["content"] for line in messages_list]
prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
sampling_params = vllm.SamplingParams(
temperature=0.7,
max_tokens=1024,
repetition_penalty=1.05,
top_p=0.9,
)
outputs = llm.generate(
prompt_token_ids=prompt_token_ids,
sampling_params=sampling_params,
lora_request=LoRARequest("lora", 1, lora_path), # LoRA adapter
)
for prompt, response in zip(prompts, outputs):
print("prompt:", prompt)
print("output:", response.outputs[0].text.strip())
print("-"*80)
import json
data = [{
"task_id": i,
#"input": prompts[i],
"output": outputs[i].outputs[0].text.strip()
} for i in range(len(tasks))]
file_path_with_unicode = 'output.jsonl'
with open(file_path_with_unicode, 'w', encoding='utf-8') as file:
for entry in data:
json.dump(entry, file, ensure_ascii=False)
file.write('\n')
print(f"Saved json {file_path_with_unicode} !")
Change log
- 2024/12/26: 推論コードの余分なコメント部分の削除,リンクの追加
Uploaded model
- Developed by: OsakanaTeishoku
- License: cc-by-nc-sa-4.0
- Finetuned from model : llm-jp/llm-jp-3-13b
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for OsakanaTeishoku/1204lora
Base model
llm-jp/llm-jp-3-13b