How to convert original model to q4f16 or q4 for web?

#1
by nickelshh - opened

How to convert original model to q4f16 or q4 for web? It seem the convert using optimium cli + quantize_dynamic for QInt4 does work in onnx-web.

I converted the model with this script that also takes care of the 4bit quantization:
https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py

Sign up or log in to comment