PixelNet (Thomas Eding)

About:

PixelNet is a ControlNet model for Stable Diffusion.

It takes a checkerboard image as input, which is used to control where logical pixels are to be placed.

This is currently an experimental proof of concept. I trained this using on around 2000 pixel-art/pixelated images that I generated using Stable Diffusion (with a lot of cleanup and manual curation). The model is not very good, but it does work on grid sizes of about a max of 64 checker "pixels" when the smallest dimension is 512. I can successfully get the model to understand 128x128 checkerboards for image generations of at least 1024x1024 pixels.

The model works best with the "Balanced" ControlNet setting. Try using a "Control Weight" of 1 or a little higher.

"ControlNet Is More Important" seems to require a heavy "Control Weight" setting to have an effect. Try using a "Control Weight" of 2.

A low "Control Weight" setting seems to produce images that resemble smooth paintings or vector art.

Smaller checker grids tend to perform worse (e.g. 5x5 vs a 32x32)

Too low or too high of a "Steps" value breaks the model. Try something like 15-30, depending on an assortment of factors. Feel free to experiment with the built-in A1111 "X/Y/Z Plot" script.

Usage:

To install, copy the .safetensors and .yaml files to your Automatic1111 ControlNet extension's model directory (e.g. stable-diffusion-webui/extensions/sd-webui-controlnet/models). Completely restart the Automatic1111 server after doing this and then refresh the web page.

There is no preprocessor. Instead, supply a black and white checkerboard image as the control input. Various control image grids can be found in this repository's grids directory. (https://huggingface.co./thomaseding/pixelnet/resolve/main/grids/grids.zip)

The script gen_checker.py can be used to generate checkerboard images of arbitrary sizes. (https://huggingface.co./thomaseding/pixelnet/blob/main/gen_checker.py) Example: python gen_checker.py --upscale-dims 512x512 --dims 70x70 --output-file control.png to generate a 70x70 checkerboard image upscaled to 512x512 pixels.

The script controlled_downscale.py is a custom downscaler made specifically for this model. You provide both the generated image and the control image used to generate it. It will downscale according to the control grid. (https://huggingface.co./thomaseding/pixelnet/blob/main/controlled_downscale.py) Example: python controlled_downscale.py --control diffusion_control.png --input diffusion_output.png --output-downscaled downscaled.png --output-quantized quantized.png --trim-cropped-edges false --sample-radius 2. See --help for more info.

FAQ:

Q: Are there any "Trigger Words" for this model?

A: Not really. I removed all words pertaining to style for my training data. This includes words like "pixel", "high quality", etc. In fact adding "pixel art" to the prompt seems to make the model perform worse (in my experience). One word I do find useful is to add "garish" to the negative prompt when the output coloring is hyper.

Q: Png or Jpeg?

A: Use Png. Jpeg's compression algorithm is terrible for pixel art.

Q: Why is this needed? Can't I use a post-processor to downscale the image?

Q: Is there special A1111 user-interface integration?

A: Yes... but not yet merged into the standard ControlNet extension's code. See (https://civitai.com/posts/371477) if you want to integrate the changes yourself in the meantime.

A: From my experience SD has a hard time creating genuine pixel art (even with dedicated base models and loras), where it has a mismatch of logical pixel sizes, smooth curves, etc. What appears to be a straight line at a glance, might bend around. This can cause post-processors to create artifacts based on quantization rounding a pixel to a position one pixel off in some direction. This model is intended to help fix that.

Q: Should I use this model with a post-processor?

A: Yes, I still recommend you do post-processing to clean up the image. This model is not perfect and will still have artifacts. Note that none of the sample output images are post-processed; they are raw outputs from the model. Consider sampling the image based on the location of the control grid checker faces. The provided controlled_downscale.py script can do this for you. You can take the output of this script (presumably the --output-downscaled file) and then run it through a different post-processor (e.g. to refine the color palette). I only tested the script for a few generated images, so it might still be a bit buggy in the way it computes the sample locations. So for now, compare the output of the script. You may find that supplying an alternative control grid image may be beneficial, or may find that using some alternative post-processing method may be better.

Q: Does the model support non-square grids?

A: Kind of. I trained it with some non-perfect square grids (when pre-upscaled checkerboards are not a factor of the upscaled image size), so in that sense it should work fine. I also trained it with some checkerboard images with genuine non-square rectangular faces (e.g. double-wide pixels).

Q: Will there be a better trained model of this in the future?

A: I hope so. I will need to curate a much larger and higher-quality dataset, which might take me a long time. Regardless, I plan on making the control effect more faithful to the control image. I may decide to try to generalize this beyond rectangular grids, but that is not a priority. I think including non-square rectangular faces in some of the training data was perhaps harmful to the model's performance. Likewise for grids smaller than 8x8. Perhaps it is better to train separate models for very small grids (but at that point, you might as well make the images by hand) and for non-square rectangular grids.

Q: What about color quantization?

A: Coming soon, "PaletteNet".

thomaseding
/

pixelnet

PixelNet (Thomas Eding)

About:

Usage:

FAQ:

Sample Outputs: