SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval Paper • 2009.13013 • Published Sep 28, 2020
How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection Paper • 2308.13177 • Published Aug 25, 2023
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head Paper • 2403.06892 • Published Mar 11, 2024 • 1
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 1
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer Paper • 2406.16620 • Published Jun 24, 2024 • 2
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 1
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 1
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 1
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection Paper • 2312.15043 • Published Dec 22, 2023 • 1
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Paper • 2306.11300 • Published Jun 20, 2023 • 1
GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent Paper • 2412.18426 • Published Dec 24, 2024
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 5 days ago • 214
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer Paper • 2406.16620 • Published Jun 24, 2024 • 2
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Paper • 2306.11300 • Published Jun 20, 2023 • 1
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head Paper • 2403.06892 • Published Mar 11, 2024 • 1