zhangtao-whu/OMG-LLaVA · Apply for community grant: Academic project (gpu)

We propose OMG-LLaVA, an elegant multi-modal large language model that can perform both image-level, object-level, and pixel-level understanding and reasoning tasks. Our work provides the community with new insights into how to efficiently build an MLLM that can understand images and visual prompts, follow text instructions, and output segmentation masks using only a visual encoder, a decoder, and an LLM. We would like to build an online demo on Gradio to help the community better experience and understand OMG-LLaVA. However, we lack a stable platform to deploy the online demo, so we kindly request that you provide us with a GPU to complete the deployment. Thank you very much.