Collections
Discover the best community collections!
Collections including paper arxiv:2411.10440
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 65 -
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Paper • 2411.03047 • Published • 8 -
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper • 2411.02336 • Published • 23 -
GenXD: Generating Any 3D and 4D Scenes
Paper • 2411.02319 • Published • 20
-
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 113 -
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Paper • 2411.10640 • Published • 44 -
Knowledge Transfer Across Modalities with Natural Language Supervision
Paper • 2411.15611 • Published • 15 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 56
-
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 89 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 23 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 113 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 63