GGUF, AWQ, GPTQ ?

by 0xbcn - opened Oct 17, 2023

Oct 17, 2023

Sorry if you see this question a lot, but what exactly is the difference of those 3 file formats?
Or better asked, what should I use if I have an GPU available for transforming and prefer the most performance over "everything else"?

For now I opted with GGUF and the llama.ccp implementation, assuming C++ performs well in this area.

YaTharThShaRma999

Oct 17, 2023

Gguf is best for cpu and mac

Gptq is best for gpu with exllama and good for servers since it can be used with tgi

Awq is highest quality and best for servers since it can be used with vllm.

So most likely gptq if you have gpu which can fit the whole model, gguf If you have Mac or cpu or can not fit the model in gpu, and awq for server inference with vllm

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment