-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 31 -
TroL: Traversal of Layers for Large Language and Vision Models
Paper • 2406.12246 • Published • 35 -
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Paper • 2406.15334 • Published • 9 -
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Paper • 2406.12742 • Published • 15
Po Hsiang Yu
EasyMoneySniper66
AI & ML interests
None yet
Organizations
None yet
Collections
4
-
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Paper • 2406.11833 • Published • 63 -
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Paper • 2406.11230 • Published • 34 -
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models
Paper • 2406.14035 • Published • 13 -
Needle In A Multimodal Haystack
Paper • 2406.07230 • Published • 53
models
None public yet
datasets
None public yet