Hao Yu

Jerydeak

AI & ML interests

NLP

Recent Activity

upvoted a paper about 2 months ago

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

upvoted a paper about 2 months ago

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

upvoted a paper about 2 months ago

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

View all activity

Organizations

Jerydeak's activity

upvoted 3 papers about 2 months ago

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Paper • 2412.20800 • Published Dec 30, 2024 • 10

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published Jan 2 • 37

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Paper • 2412.21059 • Published Dec 30, 2024 • 18

upvoted 9 papers 2 months ago

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Paper • 2412.08503 • Published Dec 11, 2024 • 8

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 19

Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 33

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Paper • 2412.08580 • Published Dec 11, 2024 • 45

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Paper • 2412.07774 • Published Dec 10, 2024 • 28

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45

upvoted 6 papers 3 months ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 14

CompCap: Improving Multimodal Large Language Models with Composite Captions

Paper • 2412.05243 • Published Dec 6, 2024 • 19

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published Dec 6, 2024 • 47

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

Paper • 2412.04062 • Published Dec 5, 2024 • 9

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published Dec 2, 2024 • 13

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Paper • 2412.04431 • Published Dec 5, 2024 • 18

upvoted a paper 4 months ago

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper • 2410.24024 • Published Oct 31, 2024 • 49

upvoted a paper 9 months ago

2BP: 2-Stage Backpropagation

Paper • 2405.18047 • Published May 28, 2024 • 25