Update README.md
Browse files
README.md
CHANGED
@@ -44,8 +44,11 @@ Documentation on installing and using vLLM [can be found here](https://vllm.read
|
|
44 |
- vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
|
45 |
|
46 |
```shell
|
47 |
-
python3 -m vllm.entrypoints.openai.api_server --model Copycats/Synatra-kiqu-10.7B-awq --quantization awq --dtype
|
48 |
```
|
|
|
|
|
|
|
49 |
|
50 |
#### Querying the model using OpenAI Chat API:
|
51 |
- You can use the create chat completion endpoint to communicate with the model in a chat-like interface:
|
|
|
44 |
- vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
|
45 |
|
46 |
```shell
|
47 |
+
python3 -m vllm.entrypoints.openai.api_server --model Copycats/Synatra-kiqu-10.7B-awq --quantization awq --dtype half
|
48 |
```
|
49 |
+
- `--model`: huggingface model path
|
50 |
+
- `--quantization`: ”awq”
|
51 |
+
- `--dtype`: “half” for FP16. Recommended for AWQ quantization.
|
52 |
|
53 |
#### Querying the model using OpenAI Chat API:
|
54 |
- You can use the create chat completion endpoint to communicate with the model in a chat-like interface:
|