arxiv:2410.02416

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Published on Oct 3

· Submitted by

msadat97 on Oct 4

Upvote

Authors:

Seyedmorteza Sadat ,

Romann M. Weber

Abstract

Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.

View arXiv page View PDF Add to collection

Community

msadat97

Paper author Paper submitter 13 days ago

•

edited 13 days ago

TL;DR: High CFG scales are useful for enhancing the quality of generations and the alignment between the input condition and the output. However, they lead to oversaturation and artifacts in generations. We show that with a few modifications to how the CFG update is applied at inference, we can vastly mitigate the oversaturation and artifacts of high guidance scales.

The method is easy to implement, and the source code for APG is given in Algorithm 1 (page 22). Please also note that we observed better results when applying APG on denoised predictions as shown in figure 12. We give details on how to obtain denoised predictions for various diffusion model formulations in Appendix B.

Edit: please check the followin thread on X for a more in-depth discussion of the method.
https://twitter.com/Msadat97/status/1842246601181646912

librarian-bot

12 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

dunkeroni

11 days ago

I can't help but notice how similar your application is to PID loops in control theory. You have a proportional gain and negative derivative gain, which are both very typical. Assuming the implication here is that the diffusion process (at least in the parallel component) is simulating a step response to a signal from the model, it might be possible to get even better results by introducing a small integral gain as well. If your current PD feedback is undershooting the ideal result, PID would resolve it.

msadat97

Paper author 10 days ago

Thank you for your interest in our work! The idea of linking classifier-free guidance to system control theory is very interesting and novel. We’d be happy to consider this in future revisions, especially if an immediate application emerges that could further enhance APG.

Please feel free to reach out with any further questions or if you'd like to continue the discussion.