-
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
Paper • 2401.10529 • Published • 1 -
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 26 -
SVIT: Scaling up Visual Instruction Tuning
Paper • 2307.04087 • Published • 6
Sulabh
sulabh-research
AI & ML interests
None yet
Organizations
None yet
Collections
10
models
None public yet
datasets
None public yet