LAB-Bench: Measuring Capabilities of Language Models for Biology Research Paper • 2407.10362 • Published Jul 14 • 4
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound Paper • 2406.06612 • Published Jun 6 • 14
🎠Avatars Collection The latest AI-powered technologies usher in a new era of realistic avatars! 🚀 • 66 items • Updated 7 days ago • 74
Video as the New Language for Real-World Decision Making Paper • 2402.17139 • Published Feb 27 • 18
Learning Continuous 3D Words for Text-to-Image Generation Paper • 2402.08654 • Published Feb 13 • 9
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models Paper • 2402.06178 • Published Feb 9 • 13
Memory Consolidation Enables Long-Context Video Understanding Paper • 2402.05861 • Published Feb 8 • 8
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5 • 17
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion Paper • 2402.03162 • Published Feb 5 • 17
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models Paper • 2401.13919 • Published Jan 25 • 23
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations Paper • 2401.01885 • Published Jan 3 • 27
Improving Diffusion-Based Image Synthesis with Context Prediction Paper • 2401.02015 • Published Jan 4 • 6
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM Paper • 2401.01256 • Published Jan 2 • 19
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos Paper • 2312.15770 • Published Dec 25, 2023 • 12
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models Paper • 2312.16693 • Published Dec 27, 2023 • 13
DreamTuner: Single Image is Enough for Subject-Driven Generation Paper • 2312.13691 • Published Dec 21, 2023 • 26
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 69
Clockwork Diffusion: Efficient Generation With Model-Step Distillation Paper • 2312.08128 • Published Dec 13, 2023 • 12
Pearl: A Production-ready Reinforcement Learning Agent Paper • 2312.03814 • Published Dec 6, 2023 • 14
VideoBooth: Diffusion-based Video Generation with Image Prompts Paper • 2312.00777 • Published Dec 1, 2023 • 20
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning Paper • 2311.12631 • Published Nov 21, 2023 • 13
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation Paper • 2311.12229 • Published Nov 20, 2023 • 26
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer Paper • 2311.12052 • Published Nov 18, 2023 • 32
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort Paper • 2311.11243 • Published Nov 19, 2023 • 14
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Paper • 2311.10093 • Published Nov 16, 2023 • 56
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 79
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper • 2311.04145 • Published Nov 7, 2023 • 32
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Paper • 2310.16656 • Published Oct 25, 2023 • 39
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Paper • 2310.16825 • Published Oct 25, 2023 • 31
Tracking Anything with Decoupled Video Segmentation Paper • 2309.03903 • Published Sep 7, 2023 • 27