DeepSeek V3 AWQ

AWQ of the DeepSeek V3 chat model.

This quant modified some of the model code to fix the overflow issue when using float16.

Tested on vLLM with 8x H100, inference speed 5 tokens/s with batch size 1 and short prompts.

Downloads last month
1,154
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for cognitivecomputations/DeepSeek-V3-AWQ

Quantized
(12)
this model