Copycats commited on
Commit
c061b5d
1 Parent(s): d67fe47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -44,8 +44,11 @@ Documentation on installing and using vLLM [can be found here](https://vllm.read
44
  - vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
45
 
46
  ```shell
47
- python3 -m vllm.entrypoints.openai.api_server --model Copycats/Synatra-kiqu-10.7B-awq --quantization awq --dtype auto
48
  ```
 
 
 
49
 
50
  #### Querying the model using OpenAI Chat API:
51
  - You can use the create chat completion endpoint to communicate with the model in a chat-like interface:
 
44
  - vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
45
 
46
  ```shell
47
+ python3 -m vllm.entrypoints.openai.api_server --model Copycats/Synatra-kiqu-10.7B-awq --quantization awq --dtype half
48
  ```
49
+ - `--model`: huggingface model path
50
+ - `--quantization`: ”awq”
51
+ - `--dtype`: “half” for FP16. Recommended for AWQ quantization.
52
 
53
  #### Querying the model using OpenAI Chat API:
54
  - You can use the create chat completion endpoint to communicate with the model in a chat-like interface: