How to convert original model to q4f16 or q4 for web?

by nickelshh - opened Sep 14, 2024

Sep 14, 2024

How to convert original model to q4f16 or q4 for web? It seem the convert using optimium cli + quantize_dynamic for QInt4 does work in onnx-web.

schmuell

Owner Sep 16, 2024

I converted the model with this script that also takes care of the 4bit quantization:
https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py

nickelshh

Sep 23, 2024

I used "python -m onnxruntime_genai.models.builder -m ~/models/Phi-3.5-mini-instruct/ -o ./model/ms -p int4 -e web" and seems the inference result is all incorrect compare to the one I download from yours.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment