Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.09967

Diffusion Models

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3 • 30
ODIN: A Single Model for 2D and 3D Perception

Paper • 2401.02416 • Published Jan 4 • 11
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1 • 20
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Paper • 2404.02747 • Published Apr 3 • 11

Video Generation

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16 • 17
AniClipart: Clipart Animation with Text-to-Video Priors

Paper • 2404.12347 • Published Apr 18 • 12
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Paper • 2404.05014 • Published Apr 7 • 53

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20
Compositional Text-to-Image Generation with Dense Blob Representations

Paper • 2405.08246 • Published May 14 • 12

Papers - University of North Carolina Chapel Hill

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20
DOCCI: Descriptions of Connected and Contrasting Images

Paper • 2404.19753 • Published Apr 30 • 11

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Paper • 2404.05014 • Published Apr 7 • 53
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27

Papers - Image - Coco Testing

about 2 hours ago

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 77
Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Paper • 2404.07448 • Published Apr 11 • 11
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30
COCONut: Modernizing COCO Segmentation

Paper • 2404.08639 • Published Apr 12 • 27

CiaraRowles/TemporalDiff

Text-to-Video • Updated Sep 30, 2023 • 169
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10 • 64

Papers - Image - SDXL

about 2 hours ago

On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3 • 17
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3 • 20
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4 • 33
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 47

Papers - ControlNet

Adding Conditional Control to Text-to-Image Diffusion Models

Paper • 2302.05543 • Published Feb 10, 2023 • 40
LightIt: Illumination Modeling and Control for Diffusion Models

Paper • 2403.10615 • Published Mar 15 • 16
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25 • 20
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

Paper • 2403.17237 • Published Mar 25 • 9

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27 • 18
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1 • 31
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3 • 26
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8 • 18

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs