schmuell/Phi-3.5-mini-instruct-onnx-web · Quantization Onnx FP32 to q4f16 for Web

Sep 14

Web has model size limitation, and Phi3.5 use q4f16 to reduce the weight, if there any public framework can do that?

Owner Sep 16

Pretty common to use 4bit quantization for llms. I used this script that takes are of it:
https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py
and under the hood it will use
https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/quantization
for the quantization.

nickelshh

Sep 21

Thanks for info, seems even use this builder.py I can not make external data smaller enough to fit chrome browser fetch limitation and the model.onnx is around 1MB only. Is there any specific parameter to make model.onnx bigger and external data smaller? Thanks.

nickelshh

Sep 21

seems I can change the size_threshold to let bigger tensor into external data file. But it seems even I used this way, I cannot get the same inference result as your original onnx_web. is there any specific setting for convert to web such as builder.py xxx -e web?. Thank you

nickelshh

Sep 21

This comment has been hidden

nickelshh changed discussion status to closed Sep 21

nickelshh changed discussion status to open Sep 21

schmuell

Owner Sep 27

I use this script to shape the external data:
https://github.com/guschmue/ort-web-perf/blob/master/onnx-chunk-external-data.py

and this script to cast the logits to fp32 so javascript does not need to deal with fp16:
https://github.com/guschmue/ort-web-perf/blob/master/onnx-wrap-fp16.py

The entire thing looks like this:

root=$PWD
model=models/tjs/Phi-3.5-mini-instruct-onnx-web

python builder.py -m models/microsoft/Phi-3.5-mini-instruct -o $model -p int4 -e web
rm -rf /tmp/opt.* /tmp/model.onnx* $model/onnx

mkdir $model/onnx
python onnx/onnx-wrap-fp16.py --input $model/model.onnx  --output /tmp/model.onnx --external_data --name logits
python onnx/onnx-chunk-external-data.py --threshhold 1 --maxchunks 1 --input /tmp/model.onnx --output $model/onnx/model_q4f16.onnx
cp models/microsoft/Phi-3.5-mini-instruct/*.json $model/