Speed-up model loading

#4
by matanj - opened

Hi and thanks very much for your great work!
I have one issue though and would appreciate your help.

On a Jetson Xavier NX, the loading time of hte model is very long.

I download and store the model once like this:

processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
processor.save_pretrained("processor")
model.save_pretrained("model")

Later on, in a separate script, I load it like this:

processor = AutoProcessor.from_pretrained("processor")
model = OmDetTurboForObjectDetection.from_pretrained("model")

It takes about 5 minutes to load.
I tried upgrading the "accelerate" package and add the flag "low_cpu_mem_usage=True" but it doesn't seem to help.

Is it possible to reduce the loading time?
Many thanks.

Hi @matanj ! Thanks for the issue. Is it just for loading or you are doing the forward pass as well, I wonder if it might happen because of the custom kernel building?

Thanks Pavel.
Did you mean the long 5 minutes? It's for only loading as I wrote.
Rrgardning the forward pass, it takes ~0.5 sec. I would love to reduce it somehow e.g. TensorRT, but I had troubles even converting it to onnx. I tried the optimum library that uses onnxruntime but it reduced only ~10%, maybe I missed stuff.

Sign up or log in to comment