Image quality issue, most likely in training data
Generating images with https://github.com/CompVis/stable-diffusion , using the default settings makes beautiful artwork... with one glaring problem. Many of the images I've made come out looking like they're center-cropped versions of portrait dimensioned images. For example, the prompt: "artstation wizard" pretty consistently gives images that have this poorly-cropped characteristic.
My suspicion is that the problem lies in how the source images were prepared before training. That is to say, if the training data's portrait aspect-ratio images were coerced into a square aspect ratio via simple center-cropping, then this is precisely the output I'd expect to see from the model.
Potential solutions on the training side of things would include:
- Use of a smart-cropping library to improve the quality of the cropping (probably the easiest option)
- Adding transparent letterboxes to the images to pad them into the correct aspect ratio
- Figuring out how to get the training and sampling to work with arbitrary aspect ratios (probably the hardest option)
I'm not aware of any workarounds for those trying to sample from the model, but I'd love to hear if others have found good prompt engineering tricks to avoid this issue. Especially when generating images in a 4:3 aspect ratio, as the cropping defect is magnified when doing this.