GPU inference

by Crowlley - opened May 15, 2024

May 15, 2024

I have a working implementation in C# using this model but it seems like it only runs in CPU, loading it in GPU (at least with DirectML provider) shows some warnings and it crashes when trying to run inference.
I know the model have custom operations in place to be able to run with the OnnxRuntime but it looks like even if part of the model operations could be run in GPU, it simply won't. It either CPU inference or it will crash.
I'm not knowledgeable enough, but is it possible to fix that at all?

Thank you for the working conversion.

anodev

Carve org May 15, 2024

•

edited May 15, 2024

Are you using the lama_fp32.onnx model? If you are using the old model - lama.onnx, then we recommend trying the new one - lama_fp32.onnx
New model can be exported to TensorRT to run fully on an NVIDIA GPU. Also, it runs on GPU using ONNX Runtime on Linux without crashing using CUDAExecutionProvider, partially, but fast.

We don't use Windows, so we have no way to test how the models work in DirectML. You can try setting the log severity level recommended by the ONNX runtime and fixing the operators in our implementation yourself, and then exporting the model again. Contributions are welcome.

The lama.onnx model cannot be fixed to run on GPU.

Crowlley

May 15, 2024

Yes I'm using the new one, and I've tried both. Might be a DirectML problem then. I don't have a CUDA GPU.

Crowlley

May 15, 2024

•

edited May 15, 2024

This is what I get from running inference in GPU with DML:

2024-05-15 16:45:08.1821135 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-15 16:45:08.1926379 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-05-15 16:45:55.3025031 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running MatMul node. Name:'/generator/model/model.5/conv1/ffc/convg2g/fu/rttn/MatMul_5' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2482)\onnxruntime.DLL!00007FFB35387AA5: (caller: 00007FFB3538712D) Exception(3) tid(3428) 80070057 The parameter is incorrect.

I'll try to fix it but I probably won't be able to.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment