Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks Paper • 2501.11733 • Published 3 days ago • 24
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Paper • 2304.14178 • Published Apr 27, 2023 • 3
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding Paper • 1908.04577 • Published Aug 13, 2019
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model Paper • 2310.05126 • Published Oct 8, 2023 • 1
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding Paper • 2307.02499 • Published Jul 4, 2023 • 15
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization Paper • 2307.08504 • Published Jul 17, 2023
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility Paper • 2307.09705 • Published Jul 19, 2023 • 1
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking Paper • 2402.12146 • Published Feb 19, 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion Paper • 2402.12195 • Published Feb 19, 2024
Evaluation and Analysis of Hallucination in Large Vision-Language Models Paper • 2308.15126 • Published Aug 29, 2023 • 1
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models Paper • 2309.00986 • Published Sep 2, 2023 • 20
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding Paper • 2403.12895 • Published Mar 19, 2024 • 32
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning Paper • 2404.16635 • Published Apr 25, 2024 • 2
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Paper • 2212.14546 • Published Dec 30, 2022
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Paper • 2302.00402 • Published Feb 1, 2023
MIBench: Evaluating Multimodal Large Language Models over Multiple Images Paper • 2407.15272 • Published Jul 21, 2024 • 10
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9, 2024 • 34
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9, 2024 • 34
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Paper • 2406.01014 • Published Jun 3, 2024 • 32