Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published about 1 month ago • 18
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22, 2024 • 23
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 22
JourneyDB: A Benchmark for Generative Image Understanding Paper • 2307.00716 • Published Jul 3, 2023 • 19