Image-to-Image
Diffusers
Safetensors
RedHotTensors commited on
Commit
e66c074
1 Parent(s): 064795b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -29
README.md CHANGED
@@ -1,50 +1,51 @@
1
  ---
2
  library_name: diffusers
3
  license: cc-by-nc-sa-4.0
 
 
4
  ---
5
  # Furception v1.0, by Project RedRocket.
6
-
7
  This is a VAE decoder finetune, resumed from stabilityai/sd-vae-ft-mse using images from e621. It is trained with a mixture of MAE and MSE loss to maintain an acceptable balance between sharpness and smooth outputs, and loss is calculated in Oklab color space in order to prioritize image reconstruction based on which color channels are more perceptually significant.
8
 
9
  Our testing has shown that the VAE is good at eliminating unwanted high-frequency noise when used on models trained on similar data. Results are far more apparent on flat-colored images than they are on realistic or painterly images, but we have not noticed any obvious loss of performance on any type of image. The effects are also more noticeable on lower-resolution generated images, but there are improvements at all resolutions. It may have some generalizability to a broader range of art styles due to the variety of different styles in the dataset.
10
 
11
- Default VAE (kl-f8):
12
- ![Default VAE](crop3[1].png)
13
- Furception 1.0:
14
- ![Our VAE](crop4[1].png)
 
 
 
15
 
16
  Note that the output is overall smoother and has significantly less artifacting around edges in high-detail regions.
17
 
18
- #### Training details:
 
 
 
 
 
 
19
  Overall training is fundamentally similar to LDM. We used the same relative base weights for MAE, MSE, and LPIPS as used in LDM and in sd-vae-ft-mse in the case of LPIPS. The discriminator's weight in the loss objective is dynamically set so that the gradient norm for the discriminator is half that of the reconstruction loss, just like LDM. We used a similar discriminator to what LDM uses, except reparameterized to Wasserstein loss with a gradient penalty and with its group norm layers replaced with layer norms.
20
 
21
  Training for version 1.0 used random square crops at various levels of downscales (Lanczos with antialiasing), randomly rotated and flipped. Training ran for 150,000 steps at a batch size of 32. EMA weights were accumulated using a similar decay to sd-vae-ft-mse scaled for our batch size and are the release version of the model.
22
 
23
- #### Credits:
24
- Development and research lead by @drhead.
25
-
26
- With research and development assistance by @RedHotTensors.
27
-
28
- And additional research assistance by @lodestones and Thessalo.
29
-
30
- Dataset curation by @lodestones and Bannanapuncakes, with additional curation by @RedHotTensors.
31
-
32
  And thanks to dogarrowtype for system administration assistance.
33
 
34
- #### Based on:
35
- CompVis Latent Diffusion: https://github.com/CompVis/latent-diffusion/
36
-
37
- StabilityAI sd-vae-ft-mse: https://huggingface.co/stabilityai/sd-vae-ft-mse
38
-
39
- LPIPS by Richard Zhang, et al: https://github.com/richzhang/PerceptualSimilarity
40
-
41
- OkLab by Björn Ottosson: https://bottosson.github.io/posts/oklab/
42
-
43
- fine-tune-models by Jonathan Chang: https://github.com/cccntu/fine-tune-models/
44
-
45
- #### Built on:
46
- Flax by Google Brain: https://github.com/google/flax
47
 
48
- And Huggingface Diffusers: https://github.com/huggingface/diffusers
 
 
49
 
50
  With deep thanks to the innumerable artists who released their works to the public for fair use in this non-commercial research project.
 
1
  ---
2
  library_name: diffusers
3
  license: cc-by-nc-sa-4.0
4
+ datasets:
5
+ - lodestones/e6-dump
6
  ---
7
  # Furception v1.0, by Project RedRocket.
 
8
  This is a VAE decoder finetune, resumed from stabilityai/sd-vae-ft-mse using images from e621. It is trained with a mixture of MAE and MSE loss to maintain an acceptable balance between sharpness and smooth outputs, and loss is calculated in Oklab color space in order to prioritize image reconstruction based on which color channels are more perceptually significant.
9
 
10
  Our testing has shown that the VAE is good at eliminating unwanted high-frequency noise when used on models trained on similar data. Results are far more apparent on flat-colored images than they are on realistic or painterly images, but we have not noticed any obvious loss of performance on any type of image. The effects are also more noticeable on lower-resolution generated images, but there are improvements at all resolutions. It may have some generalizability to a broader range of art styles due to the variety of different styles in the dataset.
11
 
12
+ <div style="width: max-content; margin: 0 auto 0 auto;">
13
+
14
+ | Default VAE (kl-f8) | Furception v1.0 |
15
+ |:----------------------------:|:------------------------:|
16
+ | ![Default VAE](crop3[1].png) | ![Our VAE](crop4[1].png) |
17
+
18
+ </div>
19
 
20
  Note that the output is overall smoother and has significantly less artifacting around edges in high-detail regions.
21
 
22
+ ## Licensing:
23
+ This VAE is available under the terms of the [CC BY-NC-SA 4.0 Deed](https://creativecommons.org/licenses/by-nc-sa/4.0/).
24
+ That means that you are free to use this model for personal, non-commercial use.
25
+ You are also free to distribute this model alongside other (non-commercial) models, as long as you give credit.
26
+ Please include the version number as well in case future models are released.
27
+
28
+ ## Training details:
29
  Overall training is fundamentally similar to LDM. We used the same relative base weights for MAE, MSE, and LPIPS as used in LDM and in sd-vae-ft-mse in the case of LPIPS. The discriminator's weight in the loss objective is dynamically set so that the gradient norm for the discriminator is half that of the reconstruction loss, just like LDM. We used a similar discriminator to what LDM uses, except reparameterized to Wasserstein loss with a gradient penalty and with its group norm layers replaced with layer norms.
30
 
31
  Training for version 1.0 used random square crops at various levels of downscales (Lanczos with antialiasing), randomly rotated and flipped. Training ran for 150,000 steps at a batch size of 32. EMA weights were accumulated using a similar decay to sd-vae-ft-mse scaled for our batch size and are the release version of the model.
32
 
33
+ ## Credits:
34
+ Development and research lead by @drhead.<br>
35
+ With research and development assistance by @RedHotTensors.<br>
36
+ And additional research assistance by @lodestones and Thessalo.<br>
37
+ Dataset curation by @lodestones and Bannanapuncakes, with additional curation by @RedHotTensors.<br>
 
 
 
 
38
  And thanks to dogarrowtype for system administration assistance.
39
 
40
+ ### Based on:
41
+ CompVis Latent Diffusion: https://github.com/CompVis/latent-diffusion/<br>
42
+ StabilityAI sd-vae-ft-mse: https://huggingface.co/stabilityai/sd-vae-ft-mse<br>
43
+ LPIPS by Richard Zhang, et al: https://github.com/richzhang/PerceptualSimilarity<br>
44
+ OkLab by Björn Ottosson: https://bottosson.github.io/posts/oklab/<br>
45
+ fine-tune-models by Jonathan Chang: https://github.com/cccntu/fine-tune-models/<br>
 
 
 
 
 
 
 
46
 
47
+ ### Built on:
48
+ Flax by Google Brain: https://github.com/google/flax<br>
49
+ And Huggingface Diffusers: https://github.com/huggingface/diffusers<br>
50
 
51
  With deep thanks to the innumerable artists who released their works to the public for fair use in this non-commercial research project.