masao1211
/

karakuri-lm-70b-chat-v0.1-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

masao1211 commited on Mar 25

Commit

eb1cbe8

•

1 Parent(s): b84c3ac

Update README.md

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -14,6 +14,8 @@ pipeline_tag: text-generation
 This repo contains AWQ model files for [KARAKURI LM 70B Chat v0.1](https://huggingface.co/karakuri-ai/karakuri-lm-70b-chat-v0.1).
 I created AWQ model files by using used autoawq==0.2.3.
 ```bash
 pip install autoawq==0.2.3
@@ -38,4 +40,27 @@ model.quantize(tokenizer, quant_config=quant_config, calib_data="mmnga/wikipedia
 quant_path = "karakuri-lm-70b-v0.1-AWQ"
 model.save_quantized(quant_path)
 tokenizer.save_pretrained(quant_path)
 ```

 This repo contains AWQ model files for [KARAKURI LM 70B Chat v0.1](https://huggingface.co/karakuri-ai/karakuri-lm-70b-chat-v0.1).
+## How to get the AWQ model
 I created AWQ model files by using used autoawq==0.2.3.
 ```bash
 pip install autoawq==0.2.3
 quant_path = "karakuri-lm-70b-v0.1-AWQ"
 model.save_quantized(quant_path)
 tokenizer.save_pretrained(quant_path)
+```
+## Usage
+```bash
+from vllm import LLM, SamplingParams
+sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
+llm = LLM(model="masao1211/karakuri-lm-70b-chat-v0.1-AWQ", max_model_len=4096)
+system_prompt = "System prompt"
+messages = [{"role": "system", "content": "System prompt"}]
+messages.append({"role": "user", "content": "User Prompt"})
+prompt = llm.llm_engine.tokenizer.tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
+prompts = [prompt]
+outputs = llm.generate(prompts, sampling_params)
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```