allow dynamic batch size
Hi @l-i ,
Thanks for opening the issue. You would always be able to enable dynamic_batch_size
while compiling the checkpoint.
According to previous experience, we used to reach better latency with static batch size. But maybe we can add a checkpoint with dynamic batch size @philschmid , WDYT?
Yes, the latest information we have is that its optimized for BS=1, but @l-i you should be able to compile it using: https://huggingface.co./docs/optimum-neuron/guides/export_model#exporting-stable-diffusion-xl-to-neuron and setting the batch size as you want.
thank you for sharing the details!
I have two follow up questions:
- if I enable dynamic batch size during compilation but still inference at the same original static batch size, would it still affect the latency?
- if I compile with a larger static batch size, how much larger my machine needs to be? (I tried with 6 on 24xl, and it seemed to fail)
Just to point out if you are compiling for large batch size and even 24xlarge run oom, you could try with CPU-only instance just for the compilation.