|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- en |
|
tags: |
|
- stable cascade |
|
--- |
|
|
|
# Stable-Cascade FP16 fix |
|
|
|
**A modified version of [Stable-Cascade](https://huggingface.co./stabilityai/stable-cascade) which is compatibile with fp16 inference** |
|
|
|
**In theory, you don't need to actually download this model file. It is possible to do onfly modification. This model is for experiments.** |
|
|
|
## Demo |
|
| FP16| BF16| |
|
| - | - | |
|
|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/fkWNY15JQbfh5pe1SY7wS.png)|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/XpfqkimqJTeDjggTaV4Mt.png)| |
|
|
|
LPIPS difference: 0.088 |
|
|
|
|
|
| FP16 | BF16| |
|
| - | - | |
|
|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/muOkoNjVK6CFv2rs6QyBr.png)|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/rrgb8yMuJDyjJu6wd366j.png)| |
|
|
|
LPIPS difference: 0.012 |
|
|
|
## How |
|
After doing some check to the L1 norm of each hidden state. I found the last block group(8, 24, 24, 8 <- this one) make the hiddens states become bigger and bigger. |
|
|
|
So I just apply some transformation on the TimestepBlock to directly modify the scale of hidden state. (Since it is not a residual block, so this is possible) |
|
|
|
How the transformation be done is written in the modified "stable_cascade.py", you can put the file into kohya-ss/sd-scripts' stable-cascade branch and uncomment things to check weights or doing the conversion by yourselve. |
|
|
|
|
|
### FP8 |
|
Some people may know the FP8 quant for inference SDXL with lowvram cards. The technique can be applied to this model too.<br> |
|
But since the last block group is basically ruined, so it is recommend to ignore the last block group:<br> |
|
```python |
|
for name, module in generator_c.named_modules(): |
|
if "up_blocks.1" in name: continue |
|
if isinstance(module, torch.nn.Linear): |
|
module.to(torch.float8_e5m2) |
|
elif isinstance(module, torch.nn.Conv2d): |
|
module.to(torch.float8_e5m2) |
|
elif isinstance(module, torch.nn.MultiheadAttention): |
|
module.to(torch.float8_e5m2) |
|
``` |
|
|
|
This sample code should transform 70% of weight into fp8. (Use FP8 weight with scale is better solution, it is recommended to implement that) |
|
|
|
I have tried different transform settings which is more friendly for FP8 but the differences between original model is more significant. |
|
|
|
FP8 Demo (Same Seed): |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/wPoZeWGGhcPMck45--y_X.png) |
|
|
|
|
|
## Notice |
|
The modified version of model will not be compatibile with the lora/lycoris trained on original weight. <br> |
|
(actually it can, just do the same transformation, I'm considering to rewrite a version to use key name to determine what to do.) |
|
|
|
Also the ControlNets will not be compatible too. Unless you also apply the needed transformation to them. |
|
|
|
I don't want to do all of these by myself so hope some others will do that. |
|
|
|
## License |
|
Stable-Cascade is published with a non-commercial lisence so I use CC-BY-NC 4.0 to publish this model. |
|
**The source code to make this model is published with apache-2.0 license** |