allow dynamic batch size

by l-i - opened Nov 13, 2023

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

-0

This PR is in draft mode

l-i

Nov 13, 2023

No description provided.

Jingya

AWS Inferentia and Trainium org Nov 14, 2023

Hi @l-i ,

Thanks for opening the issue. You would always be able to enable dynamic_batch_size while compiling the checkpoint.

According to previous experience, we used to reach better latency with static batch size. But maybe we can add a checkpoint with dynamic batch size @philschmid , WDYT?

philschmid

AWS Inferentia and Trainium org Nov 14, 2023

Yes, the latest information we have is that its optimized for BS=1, but @l-i you should be able to compile it using: https://huggingface.co./docs/optimum-neuron/guides/export_model#exporting-stable-diffusion-xl-to-neuron and setting the batch size as you want.

l-i

Nov 14, 2023

thank you for sharing the details!

I have two follow up questions:

if I enable dynamic batch size during compilation but still inference at the same original static batch size, would it still affect the latency?
if I compile with a larger static batch size, how much larger my machine needs to be? (I tried with 6 on 24xl, and it seemed to fail)

Jingya

AWS Inferentia and Trainium org Nov 17, 2023

Just to point out if you are compiling for large batch size and even 24xlarge run oom, you could try with CPU-only instance just for the compilation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Publish this branch

This branch is in draft mode, publish it to be able to merge.

· Sign up or log in to comment