Image size versus inference speed/accuracy

#22
by logankeenan - opened

I'm curious how I can increase the inference speed beside just using more VRAM. I've experimented with reducing the image size. Has anyone else tried anything to increase inference speed?

My test: How will molmo perform finding UI elements on a page as image resolution is reduced?
https://github.com/logankeenan/molmo-benchmarks/

I am also trying to speed up inference. I think reducing image size is similar to reduce max_crops for processor module, which leads to fewer image tokens to be processed, therefore the speed up. I will update if my other speed up effort works.

Hi @logankeenan , @yoarkyang
torch.nn.DataParallel works like a charm to speed up inference if you have multiple GPUs. Use this code snippet:

if num_gpus > 1:
    model = torch.nn.DataParallel(model, device_ids=list(range(num_gpus)))
model.to(device)

@amanrangapur - I've been using the vllm implementation for now, but will give that a try in the future, thanks so much!

Sign up or log in to comment