Image size versus inference speed/accuracy
I'm curious how I can increase the inference speed beside just using more VRAM. I've experimented with reducing the image size. Has anyone else tried anything to increase inference speed?
My test: How will molmo perform finding UI elements on a page as image resolution is reduced?
https://github.com/logankeenan/molmo-benchmarks/
I am also trying to speed up inference. I think reducing image size is similar to reduce max_crops for processor module, which leads to fewer image tokens to be processed, therefore the speed up. I will update if my other speed up effort works.
Hi
@logankeenan
,
@yoarkyang
torch.nn.DataParallel
works like a charm to speed up inference if you have multiple GPUs. Use this code snippet:
if num_gpus > 1:
model = torch.nn.DataParallel(model, device_ids=list(range(num_gpus)))
model.to(device)
@amanrangapur - I've been using the vllm implementation for now, but will give that a try in the future, thanks so much!