WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published 18 days ago • 16
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9 • 31
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published May 29 • 12
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Paper • 2401.16158 • Published Jan 29 • 17
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 258
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration Paper • 2311.04257 • Published Nov 7, 2023 • 20
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation Paper • 2306.07954 • Published Jun 13, 2023 • 113