Learning Flow Fields in Attention for Controllable Person Image Generation Paper β’ 2412.08486 β’ Published Dec 11, 2024 β’ 34 β’ 6
Learning Flow Fields in Attention for Controllable Person Image Generation Paper β’ 2412.08486 β’ Published Dec 11, 2024 β’ 34 β’ 6
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Paper β’ 2401.04575 β’ Published Jan 9, 2024 β’ 17 β’ 4
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper β’ 2410.12705 β’ Published Oct 16, 2024 β’ 32 β’ 3
Guiding a Diffusion Model with a Bad Version of Itself Paper β’ 2406.02507 β’ Published Jun 4, 2024 β’ 17 β’ 1
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Paper β’ 2406.02523 β’ Published Jun 4, 2024 β’ 12 β’ 1
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Paper β’ 2406.02511 β’ Published Jun 4, 2024 β’ 11 β’ 2
I4VGen: Image as Stepping Stone for Text-to-Video Generation Paper β’ 2406.02230 β’ Published Jun 4, 2024 β’ 18 β’ 3
Self-Improving Robust Preference Optimization Paper β’ 2406.01660 β’ Published Jun 3, 2024 β’ 20 β’ 1
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper β’ 2406.02430 β’ Published Jun 4, 2024 β’ 34 β’ 2
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper β’ 2406.02886 β’ Published Jun 5, 2024 β’ 11 β’ 1
Item-Language Model for Conversational Recommendation Paper β’ 2406.02844 β’ Published Jun 5, 2024 β’ 12 β’ 1
Searching Priors Makes Text-to-Video Synthesis Better Paper β’ 2406.03215 β’ Published Jun 5, 2024 β’ 14 β’ 2