--- base_model: llm-jp/llm-jp-3-13b tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en - ja datasets: - weblab-GENIAC/aya-ja-evol-instruct-calm3-dpo-masked - DeL-TaiseiOzaki/Tengentoppa-sft-mini-vol1.0 --- ## 以下は推論用コードです。 * 事前に以下をインストールしてください。 * pip install -q numpy==1.26.4 * pip install -q vllm==0.6.4 * pip install -q bitsandbytes==0.44.1 ```python from vllm import LLM, SamplingParams from vllm.lora.request import LoRARequest import torch import json from datasets import load_dataset from huggingface_hub import snapshot_download id = "llm-jp-3-13b-it-bs4-ac10-step370-lora" lora_path = snapshot_download(repo_id="jaked97/"+ id) model_id = "models/models--llm-jp--llm-jp-3-13b/snapshots/cd3823f4c1fcbb0ad2e2af46036ab1b0ca13192a" tasks = load_dataset("json", data_files="./elyza-tasks-100-TV_0.jsonl", split="train") prompts = [ f"""### instruction: あなたは親切なAIアシスタントです。 ### input: {input} ### output: """ for input in tasks["input"]] llm = LLM( model=model_id, gpu_memory_utilization=0.99, quantization="bitsandbytes", load_format="bitsandbytes", trust_remote_code=True, enforce_eager=True, enable_lora=True, max_lora_rank=64, ) outputs = llm.generate( prompts, sampling_params = SamplingParams( temperature=0, max_tokens=1024, min_tokens=1, repetition_penalty=1.2, skip_special_tokens=True, seed=97, ), lora_request=LoRARequest("sql_adapter", 1, lora_path), ) with open(f"./{id}_max1024-nf4-vllm.jsonl", 'w', encoding='utf-8') as f: for i in range(len(outputs)): result = { "task_id" : tasks[i]["task_id"], "input" : tasks[i]["input"], "output" : outputs[i].outputs[0].text } json.dump(result, f, ensure_ascii=False) f.write('\n') ``` # Uploaded model - **Developed by:** jaked97 - **License:** apache-2.0 - **Finetuned from model :** llm-jp/llm-jp-3-13b This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)