Quantize more models?

by MiaoCata - opened 7 days ago

7 days ago

Great work! But here's a fine-tuned model called DeepScaleR which has a better performance, could you quantize it with NexaQuant from the original Q8_0?

MiaoCata

7 days ago

I think Q8_0 and FP16 provide similar performance, so using Q8_0 may be even faster

Losanti123

4 days ago

The DeepScaleR model exhibits superior performance; therefore, please quantize it using NexaQuant, starting from the original Q8_0 configuration.

zackli4ai

Nexa AI org 1 day ago

@Losanti123 Thanks for bringing this up! Currently, we only support Q4_0, as standard 4-bit quantization leads to an observable reasoning performance loss. In contrast, 8-bit quantization has little performance loss compared to FP16.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment