DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published 7 days ago • 10
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published 2 days ago • 19
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published 2 days ago • 24
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 3 days ago • 11
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published 7 days ago • 9
A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis Paper • 2409.08947 • Published 6 days ago • 11
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published 14 days ago • 16
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published 18 days ago • 10
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper • 2408.16768 • Published 21 days ago • 25
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published 21 days ago • 29
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation Paper • 2408.14819 • Published 24 days ago • 18
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published 23 days ago • 27
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published 25 days ago • 58
Training-free Long Video Generation with Chain of Diffusion Model Experts Paper • 2408.13423 • Published 27 days ago • 19
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation Paper • 2408.13252 • Published 27 days ago • 23
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models Paper • 2308.06721 • Published Aug 13, 2023 • 28
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published 28 days ago • 33
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches Paper • 2408.04567 • Published Aug 8 • 23
RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis Paper • 2408.03356 • Published Aug 6 • 8
Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields Paper • 2408.03822 • Published Aug 7 • 9
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Paper • 2408.03178 • Published Aug 6 • 35
LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping Paper • 2402.18351 • Published Feb 28 • 1
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper • 2407.21705 • Published Jul 31 • 25
WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds Paper • 2407.18946 • Published Jul 11 • 12
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention Paper • 2407.19918 • Published Jul 29 • 47
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation Paper • 2407.15060 • Published Jul 21 • 9
Floating No More: Object-Ground Reconstruction from a Single Image Paper • 2407.18914 • Published Jul 26 • 18
SHIC: Shape-Image Correspondences with no Keypoint Supervision Paper • 2407.18907 • Published Jul 26 • 38
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation Paper • 2407.17952 • Published Jul 25 • 27
ViPer: Visual Personalization of Generative Models via Individual Preference Learning Paper • 2407.17365 • Published Jul 24 • 11
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions Paper • 2407.15187 • Published Jul 21 • 10
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders Paper • 2407.14435 • Published Jul 19 • 6
Understanding Reference Policies in Direct Preference Optimization Paper • 2407.13709 • Published Jul 18 • 15
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Paper • 2407.13759 • Published Jul 18 • 17
E5-V: Universal Embeddings with Multimodal Large Language Models Paper • 2407.12580 • Published Jul 17 • 38
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models Paper • 2407.10285 • Published Jul 14 • 4
StyleSplat: 3D Object Style Transfer with Gaussian Splatting Paper • 2407.09473 • Published Jul 12 • 10
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On Paper • 2407.08348 • Published Jul 11 • 51
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models Paper • 2407.06938 • Published Jul 9 • 21
VIMI: Grounding Video Generation through Multi-modal Instruction Paper • 2407.06304 • Published Jul 8 • 9
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Paper • 2407.05282 • Published Jul 7 • 12
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Paper • 2407.04051 • Published Jul 4 • 35
Investigating Decoder-only Large Language Models for Speech-to-text Translation Paper • 2407.03169 • Published Jul 3 • 9