rowan-gim
's Collections
paper to review
updated
VideoSwap: Customized Video Subject Swapping with Interactive Semantic
Point Correspondence
Paper
•
2312.02087
•
Published
•
20
FaceStudio: Put Your Face Everywhere in Seconds
Paper
•
2312.02663
•
Published
•
30
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
•
2312.02432
•
Published
•
12
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper
•
2312.02981
•
Published
•
8
ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
Paper
•
2312.02201
•
Published
•
31
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
•
2312.02919
•
Published
•
10
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper
•
2312.00777
•
Published
•
21
MoMask: Generative Masked Modeling of 3D Human Motions
Paper
•
2312.00063
•
Published
•
15
Make Pixels Dance: High-Dynamic Video Generation
Paper
•
2311.10982
•
Published
•
68
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
•
2312.03491
•
Published
•
34
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
•
2312.03818
•
Published
•
32
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
•
2312.03793
•
Published
•
17
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
•
2312.04461
•
Published
•
57
Controllable Human-Object Interaction Synthesis
Paper
•
2312.03913
•
Published
•
22
Photorealistic Video Generation with Diffusion Models
Paper
•
2312.06662
•
Published
•
23
Context Tuning for Retrieval Augmented Generation
Paper
•
2312.05708
•
Published
•
16
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper
•
2312.09911
•
Published
•
53
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
•
2312.09767
•
Published
•
25
Faithful Persona-based Conversational Dataset Generation with Large
Language Models
Paper
•
2312.10007
•
Published
•
6
VecFusion: Vector Font Generation with Diffusion
Paper
•
2312.10540
•
Published
•
21
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Paper
•
2312.11461
•
Published
•
18
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient
Volumetric Encoder
Paper
•
2312.11459
•
Published
•
5
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper
•
2312.10656
•
Published
•
10
Gemini: A Family of Highly Capable Multimodal Models
Paper
•
2312.11805
•
Published
•
45
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper
•
2312.14125
•
Published
•
44
Scalable Pre-training of Large Autoregressive Image Models
Paper
•
2401.08541
•
Published
•
35
Aria Everyday Activities Dataset
Paper
•
2402.13349
•
Published
•
29
SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Paper
•
2402.13929
•
Published
•
27
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
•
2402.12226
•
Published
•
40
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
•
2402.17177
•
Published
•
88
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
Audio2Video Diffusion Model under Weak Conditions
Paper
•
2402.17485
•
Published
•
188
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
Latent Aligners
Paper
•
2402.17723
•
Published
•
16
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
•
2403.03163
•
Published
•
93
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
Diffusion Models
Paper
•
2403.03100
•
Published
•
34
Personalized Audiobook Recommendations at Spotify Through Graph Neural
Networks
Paper
•
2403.05185
•
Published
•
20
RAFT: Adapting Language Model to Domain Specific RAG
Paper
•
2403.10131
•
Published
•
67
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for
Reconstructing Challenging Surfaces
Paper
•
2403.20275
•
Published
•
8
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual
Observations
Paper
•
2404.04421
•
Published
•
16
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper
•
2404.07616
•
Published
•
15
KAN: Kolmogorov-Arnold Networks
Paper
•
2404.19756
•
Published
•
108
Layer-Condensed KV Cache for Efficient Inference of Large Language
Models
Paper
•
2405.10637
•
Published
•
19