Copycats
/

Synatra-kiqu-10.7B-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

Copycats commited on Apr 6

Commit

c061b5d

•

1 Parent(s): d67fe47

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -44,8 +44,11 @@ Documentation on installing and using vLLM [can be found here](https://vllm.read
 - vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
 ```shell
-python3 -m vllm.entrypoints.openai.api_server --model Copycats/Synatra-kiqu-10.7B-awq --quantization awq --dtype auto
 ```
 #### Querying the model using OpenAI Chat API:
 - You can use the create chat completion endpoint to communicate with the model in a chat-like interface:

 - vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
 ```shell
+python3 -m vllm.entrypoints.openai.api_server --model Copycats/Synatra-kiqu-10.7B-awq --quantization awq --dtype half
 ```
+ - `--model`: huggingface model path
+ - `--quantization`: ”awq”
+ - `--dtype`: “half” for FP16. Recommended for AWQ quantization.
 #### Querying the model using OpenAI Chat API:
 - You can use the create chat completion endpoint to communicate with the model in a chat-like interface: