Why torch.compile has very small acceleration for Donut model?
Hey!
Recently, I've tried to use torch.compile to accelerate inference for Donut. However, I've got very small improvement: 0.602±0.016 --> 0.596±0.016 (~1%). I've tried all possible modes and so on, but it hasn't improved this result.
To check is my set up and versions are okay, I tested resnet18 and here I've got a result which matched with the benchmarks: 0.004±0.00005 --> 0.0026±0.0001 (~35%).
My setup:
GPU A100 40GB
CUDA 12.0
Do someone have any suggestions what I should do to improve effect of torch.compile for Donut?
Hi gorodnitskiy,
I got the same trend on my side. It may look consistent to this benchmark:
https://huggingface.co./docs/transformers/en/perf_torch_compile
where almost no improvement was observed for certain architectures like ViT when batch size > 1?