Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published Nov 29, 2024 • 57
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 59
Free^2Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models Paper • 2411.17041 • Published Nov 26, 2024 • 12
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models Paper • 2411.18350 • Published Nov 27, 2024 • 24
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Paper • 2411.18203 • Published Nov 27, 2024 • 34
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Paper • 2411.17440 • Published Nov 26, 2024 • 35
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving Paper • 2411.15139 • Published Nov 22, 2024 • 15
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation Paper • 2411.17945 • Published Nov 26, 2024 • 23
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 82
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient Paper • 2411.17787 • Published Nov 26, 2024 • 11
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published Nov 27, 2024 • 50
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22, 2024 • 42
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7, 2024 • 37
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 474
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29, 2024 • 56