What means "scaled"? fp8 model only?
Hi!
What means "scaled"? Is something different from a normal fp8 version?
Also, could a model only version be uploaded to use with separate clip models and save disk space?
Yes please, you said it was experimental and low-vram, but no specifics have been given on what exactly you did with it. I'm also curious if it's possible to use with Flux-Dev, or if it only works with SD3.5.
Thanks for your time.
It means the fp8 weights are scaled. For example if your weight values for a go from -1.0 to 1.0 but fp8 can represent values from -448 to 448 there is a lot of space in the fp8 that is wasted if you just do a simple conversion.
the solution is to multiply the weight be 448 and then store the value (1/448) for unscaling it during inference in the checkpoint. This means less data is lost.
Oh okay.
Thanks, makes sense (that's pretty cool too).
scaled fp8 checkpoint has compability issue with lora trained on original checkpoint. the result is significantly different.