naver-clova-ix/donut-base · Why torch.compile has very small acceleration for Donut model?

Hey!

Recently, I've tried to use torch.compile to accelerate inference for Donut. However, I've got very small improvement: 0.602±0.016 --> 0.596±0.016 (~1%). I've tried all possible modes and so on, but it hasn't improved this result.

To check is my set up and versions are okay, I tested resnet18 and here I've got a result which matched with the benchmarks: 0.004±0.00005 --> 0.0026±0.0001 (~35%).

My setup:
GPU A100 40GB
CUDA 12.0

Do someone have any suggestions what I should do to improve effect of torch.compile for Donut?