bowenbaoamd commited on
Commit
02dadcc
·
verified ·
1 Parent(s): b4cd529

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -23,7 +23,8 @@ python3 quantize_quark.py \
23
  --kv_cache_dtype fp8 \
24
  --num_calib_data 128 \
25
  --model_export quark_safetensors \
26
- --no_weight_matrix_merge
 
27
  # If model size is too large for single GPU, please use multi GPU instead.
28
  python3 quantize_quark.py \
29
  --model_dir $MODEL_DIR \
@@ -33,7 +34,8 @@ python3 quantize_quark.py \
33
  --num_calib_data 128 \
34
  --model_export quark_safetensors \
35
  --no_weight_matrix_merge \
36
- --multi_gpu
 
37
  ```
38
  ## Deployment
39
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
 
23
  --kv_cache_dtype fp8 \
24
  --num_calib_data 128 \
25
  --model_export quark_safetensors \
26
+ --no_weight_matrix_merge \
27
+ --custom_mode fp8
28
  # If model size is too large for single GPU, please use multi GPU instead.
29
  python3 quantize_quark.py \
30
  --model_dir $MODEL_DIR \
 
34
  --num_calib_data 128 \
35
  --model_export quark_safetensors \
36
  --no_weight_matrix_merge \
37
+ --multi_gpu \
38
+ --custom_mode fp8
39
  ```
40
  ## Deployment
41
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).