haqishen
/

Llama-3-8B-Japanese-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

haqishen commited on Apr 23

Commit

65619ff

•

1 Parent(s): 1a10246

Update README.md

Files changed (1) hide show

README.md +40 -0

README.md CHANGED Viewed

@@ -113,6 +113,46 @@ response = outputs[0][input_ids.shape[-1]:]
 print(tokenizer.decode(response, skip_special_tokens=True))
 ```
 ## Examples
 ```

 print(tokenizer.decode(response, skip_special_tokens=True))
 ```
+### Use with vllm
+[vllm-project/vllm](https://github.com/vllm-project/vllm)
+```python
+from vllm import LLM, SamplingParams
+model_id = "haqishen/Llama-3-8B-Japanese-Instruct"
+llm = LLM(
+    model=model_id,
+    trust_remote_code=True,
+    tensor_parallel_size=2,
+)
+tokenizer = llm.get_tokenizer()
+messages = [
+    {"role": "system", "content": "あなたは、常に海賊の言葉で返事する海賊チャットボットです！"},
+    {"role": "user", "content": "自己紹介してください"},
+]
+conversations = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+outputs = llm.generate(
+    [conversations],
+    SamplingParams(
+        temperature=0.6,
+        top_p=0.9,
+        max_tokens=1024,
+        stop_token_ids=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")],
+    )
+)
+print(outputs[0].outputs[0].text.strip())
+```
 ## Examples
 ```