MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published 2 days ago • 42
Introducing Visual Perception Token into Multimodal Large Language Model Paper • 2502.17425 • Published 4 days ago • 11
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published 3 days ago • 61
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published 7 days ago • 15
Evaluating Multimodal Generative AI with Korean Educational Standards Paper • 2502.15422 • Published 7 days ago • 9
VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues Paper • 2502.12084 • Published 11 days ago • 29
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 8 days ago • 118
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking Paper • 2502.13766 • Published 9 days ago • 3
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper • 2502.11573 • Published 12 days ago • 8
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation Paper • 2502.09838 • Published 15 days ago • 9
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Paper • 2502.12513 • Published 11 days ago • 15
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published 10 days ago • 35
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 10 days ago • 76
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Paper • 2502.12148 • Published 11 days ago • 16
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published 15 days ago • 38
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published 14 days ago • 30
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data Paper • 2502.08468 • Published 16 days ago • 13