Is it FP32 or FP16 version

Yes.. I tried model.half() but it is not applying on ONNX model, but works on normal transformer model.. I need a way to convert the GPT-j to onnx fp16. I used optimum onnx: python -m optimum.exporters.onnx --task causal-lm-with-past --for-ort --model gpt-j-6B gptj16_onnx/

to convert, but did not get a way to convert it to FP16

chainyo

ONNXConfig for all org Jan 25, 2023

Could you try to load the PyTorch model, apply model.half(), save the PyTorch model, and then export this saved model to ONNX?

(To store the saved model, you can create a new HF repo for that)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment