File size: 3,032 Bytes
2273b58 b21b792 2273b58 b21b792 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
license: cc-by-nc-4.0
language:
- en
tags:
- stable cascade
---
# Stable-Cascade FP16 fix
**A modified version of [Stable-Cascade](https://huggingface.co./stabilityai/stable-cascade) which is compatibile with fp16 inference**
## Demo
| FP16| BF16|
| - | - |
|||
LPIPS difference: 0.088
| FP16 | BF16|
| - | - |
|||
LPIPS difference: 0.012
## How
After doing some check to the L1 norm of each hidden state. I found the last block group(8, 24, 24, 8 <- this one) make the hiddens states become bigger and bigger.
So I just apply some transformation on the TimestepBlock to directly modify the scale of hidden state. (Since it is not a residual block, so this is possible)
How the transformation be done is written in the modified "stable_cascade.py", you can put the file into kohya-ss/sd-scripts' stable-cascade branch and uncomment things to check weights or doing the conversion by yourselve.
### FP8
Some people may know the FP8 quant for inference SDXL with lowvram cards. The technique can be applied to this model too.<br>
But since the last block group is basically ruined, so it is recommend to ignore the last block group:<br>
```python
for name, module in generator_c.named_modules():
if "up_blocks.1" in name: continue
if isinstance(module, torch.nn.Linear):
module.to(torch.float8_e5m2)
elif isinstance(module, torch.nn.Conv2d):
module.to(torch.float8_e5m2)
elif isinstance(module, torch.nn.MultiheadAttention):
module.to(torch.float8_e5m2)
```
This sample code should transform 70% of weight into fp8. (Use FP8 weight with scale is better solution, it is recommended to implement that)
I have tried different transform settings which is more friendly for FP8 but the differences between original model is more significant.
FP8 Demo (Same Seed):

## Notice
The modified version of model will not be compatibile with the lora/lycoris trained on original weight. <br>
(actually it can, just do the same transformation, I'm considering to rewrite a version to use key name to determine what to do.)
Also the ControlNets will not be compatible too. Unless you also apply the needed transformation to them.
I don't want to do all of these by myself so hope some others will do that.
## License
Stable-Cascade is published with a non-commercial lisence so I use CC-BY-NC 4.0 to publish this model.
**The source code to make this model is published with apache-2.0 license** |