README.md · panopstor/huber-exp at dc7206f6f0f53bb3e822d1c548f3afe50e991ae2

metadata

license: creativeml-openrail-m

SD1.5 experiments with Huber and MSE loss. All models trained for 4 epochs on approximately 250k images from a variety of sources. Approximately half from LAION Aesthetics, and a few thousand 4K video rips with COG-VLM captions.

Trained using Everydream2 Trainer (https://github.com/victorchall/EveryDream2trainer) on an RTX 6000 Ada 48gb. Each epoch takes approximately 10 hours for a total of about 40 hours per model.

Multi-aspect ratio trained with nominal size of <=768^2 pixels for each bucket
Batch size 12 with grad accum 10.
AdamW 8bit optimizer with standard betas of (0.9,0.999) and weight decay of 0.010.
Trained with automatic mixed precision FP16 and TF32 matmul
3.0e-6 LR cosine schedule with a ~12 epoch target to decay, ending around 2.3e-6 at end of training
Pyramid noise using discount 0.03
Zero offset noise of 0.02
Min SNR gamma of 5.0
Unet only training, text encoder left frozen.
Conditional dropout of 10%

The following models were produced:

768_huber.safetensors - Huber loss only
768_mse.safetensors - MSE loss only
768_ts0huber_ts999mse.safetensors - Huber loss at timestep 0 interpolated to MSE loss at timestep 999
768_ts0mse_ts999huber.safetensors - MSE loss at timestep 0 interpolated to Huber loss at timestep 999

Worth noting timestep 0 is the lowest-noise-added step and 999 is most noised timestep.