-
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper • 2401.01952 • Published • 30 -
ODIN: A Single Model for 2D and 3D Perception
Paper • 2401.02416 • Published • 11 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 20 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 11
Collections
Discover the best community collections!
Collections including paper arxiv:2404.09967
-
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 17 -
AniClipart: Clipart Animation with Text-to-Video Priors
Paper • 2404.12347 • Published • 12 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 20 -
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Paper • 2404.05014 • Published • 53
-
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Paper • 2404.05014 • Published • 53 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 20 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 27
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 77 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 11 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 30 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 27
-
CiaraRowles/TemporalDiff
Text-to-Video • Updated • 169 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 20 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 64
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 17 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 20 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 33 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 47
-
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 40 -
LightIt: Illumination Modeling and Control for Diffusion Models
Paper • 2403.10615 • Published • 16 -
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 20 -
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Paper • 2403.17237 • Published • 9
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 18 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 31 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 26 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 18