Is it FP32 or FP16 version
Is the ORT model fp32 or fp16, if it is fp32 can you share some way to export it to fp16 to able to fit in 16GB GPU?
Hi @pankajdev007 , I used this repo as a base to export the model to ONNX, so I do believe it's fp32.
Check the weights by doing this after the model load:
print(model.dtype)
You can force the weights of your model to be fp16 by doing this in PyTorch:
net = Model()
net.half()
So you can probably do it with Transformers too!
(I found this on this PyTorch thread. It would be best if you were careful about prediction after conversion because of potential NaNs).
Yes.. I tried model.half() but it is not applying on ONNX model, but works on normal transformer model.. I need a way to convert the GPT-j to onnx fp16. I used optimum onnx: python -m optimum.exporters.onnx --task causal-lm-with-past --for-ort --model gpt-j-6B gptj16_onnx/
to convert, but did not get a way to convert it to FP16
Could you try to load the PyTorch model, apply model.half()
, save the PyTorch model, and then export this saved model to ONNX?
(To store the saved model, you can create a new HF repo for that)