File size: 3,893 Bytes

2f32804
 
 
 
 
 
 
 
 
abcff1a
74785a1
2f32804
abcff1a
 
4855037
6539bad
4855037
6539bad
abcff1a
 
 
4855037
abcff1a
 
4855037
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0cc322b
4855037
 
 
 
 
 
 
 
 
 
 
 
 
a3cd682
0cc322b
4855037
 
 
 
 
 
 
 
 
 
daa0894
 
 
 
 
 
 
 
 
 
0cc322b
daa0894
 
 
 
 
 
 
 
76c15ee
daa0894
 
76c15ee
daa0894
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abcff1a
0cc322b
6539bad
abcff1a
2f32804
 
 
abcff1a
2f32804
 
 
 
abcff1a

---
base_model: llm-jp/llm-jp-3-13b
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
language:
- ja
license: cc-by-nc-sa-4.0
---
# はじめに
これは，東京大学松尾・岩澤研究室のLLM講座2024のコンペティションで提出するためのモデルです．

[llm-jp/llm-jp-3-13b](https://huggingface.co./llm-jp/llm-jp-3-13b)に，QLoRAによるSFTを施して，LoRAアダプタのみをこちらにアップしています．

chat templateは，[weblab-GENIAC/Tanuki-8B-dpo-v1.0](https://huggingface.co./weblab-GENIAC/Tanuki-8B-dpo-v1.0)のものと同一のものを使用しています．

# 推論方法
提供された環境で，以下のように推論します．L4 GPU×1のインスタンスで，vLLMを用いて推論します．

Jupyter Notebookで，一かたまりごとに一つのセルになっています．順番に実行してください．
```python
!pip uninstall numpy -y
!pip install numpy==1.26.4

%%time
%pip install vllm==0.6.4.post1 --force-reinstall

!pip install ipywidgets

import time
import torch
import transformers
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig
)
import vllm 
from vllm.lora.request import LoRARequest
from jinja2 import Template
print(vllm.__version__)

MAX_LENGTH = 1024
MODEL_NAME = "llm-jp/llm-jp-3-13b"
print(MODEL_NAME)

import os
os.environ["HF_TOKEN"] = "あなたのHugging Faceトークン"

from vllm.lora.request import LoRARequest
llm = vllm.LLM(
    MODEL_NAME, 
    tensor_parallel_size=1, 
    gpu_memory_utilization=0.95,
    trust_remote_code=True,
    enforce_eager=True,
    max_model_len=MAX_LENGTH,
    enable_lora=True,
    quantization="bitsandbytes",
    load_format="bitsandbytes"
)
tokenizer = llm.get_tokenizer()

from transformers import AutoTokenizer
sft_tokenizer = AutoTokenizer.from_pretrained(
    "weblab-GENIAC/Tanuki-8B-dpo-v1.0"
)
tokenizer.chat_template = sft_tokenizer.chat_template

from huggingface_hub import snapshot_download
lora_path = snapshot_download(repo_id="OsakanaTeishoku/1204lora")

from datasets import load_dataset
data_files = {"test": "elyza-tasks-100-TV_0.jsonl"}  
tasks = load_dataset("json", data_files=data_files, split="test")

messages_list = [
    [{"role": "user", "content": tasks["input"][i]}] for i in range(len(tasks))
]
prompts = [line[0]["content"] for line in messages_list]
prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
sampling_params = vllm.SamplingParams(
    temperature=0.7,
    max_tokens=1024,
    repetition_penalty=1.05,
    top_p=0.9,
)
outputs = llm.generate(
    prompt_token_ids=prompt_token_ids,
    sampling_params=sampling_params,
    lora_request=LoRARequest("lora", 1, lora_path), # LoRA adapter
)
for prompt, response in zip(prompts, outputs):
    print("prompt:", prompt)
    print("output:", response.outputs[0].text.strip())
    print("-"*80)   
import json
data = [{
    "task_id": i, 
    #"input": prompts[i],
    "output": outputs[i].outputs[0].text.strip()
} for i in range(len(tasks))]
file_path_with_unicode = 'output.jsonl'
with open(file_path_with_unicode, 'w', encoding='utf-8') as file:
    for entry in data:
        json.dump(entry, file, ensure_ascii=False)
        file.write('\n')
print(f"Saved json {file_path_with_unicode} !")
```
# Change log
- 2024/12/26: 推論コードの余分なコメント部分の削除，リンクの追加

# Uploaded  model

- **Developed by:** OsakanaTeishoku
- **License:** cc-by-nc-sa-4.0
- **Finetuned from model :** llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)