Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published 5 days ago • 22
InstantIR: Blind Image Restoration with Instant Generative Reference Paper • 2410.06551 • Published Oct 9 • 6
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published 6 days ago • 22
Zero-shot Model-based Reinforcement Learning using Large Language Models Paper • 2410.11711 • Published 26 days ago • 8
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published 19 days ago • 58
Pangea Collection A Fully Open Multilingual Multimodal LLM for 39 Languages • 18 items • Updated 8 days ago • 17
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published 20 days ago • 42
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published 20 days ago • 55
DPLM-2: A Multimodal Diffusion Protein Language Model Paper • 2410.13782 • Published 23 days ago • 19
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published 25 days ago • 29
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities Paper • 2410.11190 • Published 26 days ago • 20
Can MLLMs Understand the Deep Implication Behind Chinese Images? Paper • 2410.13854 • Published 23 days ago • 8
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published 23 days ago • 27
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published 26 days ago • 46
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published 27 days ago • 36