Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper β’ 2404.13013 β’ Published Apr 19, 2024 β’ 31
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 β’ 57
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 β’ 72