A request or wish for the final specification to be like this
It's just a fantasy, so please let it go lightly.
- This "sayakpaul/flux.1-dev-nf4-pkg" repo (and repos with the same specifications) be available from InferenceClient().
- Add an option to from_pretrained to make it easier to handle quantized files.
- Add a function to fetch missing files to from_single_file like this. (Because the files distributed only unet (transformer) are already the overwhelming majority in Flux at this point in time. from_single_file, which is probably supposed to be for beginners, is now the most difficult to handle... Even without the SD1.5 CLIP problem. It's a relief that the majority of the distribution is still in an unet with torch.float8_* for now.)
unet_nf4 = "https://huggingface.co./lllyasviel/flux1-dev-bnb-nf4/blob/main/flux1-dev-bnb-nf4-v2.safetensors" # unet only
base_repo_nf4 = "sayakpaul/flux.1-dev-nf4-pkg" # get the rest from here
quantized_type = "NF4" # A kind of hardcoded presets
pipe = from_single_file(unet_nf4, base_model=base_repo_nf4, qtype=quantized_type)
pipe.save_pretrained("whole_nf4ed.safetensors")
It will manage Flux conversion to work with 16GB of RAM.
In addition, the SD 1.5 class could barely be made to work with on-board GPUs.
I mean the transformer
directory in this repo already has the quantized checkpoint. Why use the one with unet_nf4
?
pipe.save_pretrained("whole_nf4ed.safetensors")
this could be possible right now with https://github.com/huggingface/diffusers/pull/9213/.
Essentially, you don't need from_single_file()
if you already have the quantized checkpoint, which is why I shared them in this repository. It makes the workflow easier.
diffusers
doesn't advocate for all the checkpoints to be clubbed in a single file. We prefer that each model-level component be distributed separately, which helps isolate loading and improve flexibility.
I mean the transformer directory in this repo already has the quantized checkpoint. Why use the one with unet_nf4?
I'm sorry!π
You are right, lllyasviel's it was the same official dev unet.
I am assuming this is the case for example. (This is a GGUF, not an NF4, but this is just an example.)
If the situation were to be the same as SDXL, the number of single safetensors files distributed in this way would easily exceed 1,000.
unet_nf4 = "https://huggingface.co./Zuntan/FluxDev8AnimeNsfw/blob/main/FluxDev8AnimeNsfw%5Bfca_style%2Canime%5D-Q4_K_S.gguf"
pipe.save_pretrained("whole_nf4ed.safetensors") this could be possible right now with https://github.com/huggingface/diffusers/pull/9213/.
Great!
diffusers doesn't advocate for all the checkpoints to be clubbed in a single file. We prefer that each model-level component be distributed separately, which helps isolate loading and improve flexibility.
Separate files have many advantages, such as the ability to easily replace parts of a model's structure, so they are now being used in ComfyUI and WebUI (partly), but for websites that manages files on a file-by-file basis, it is probably too much of a hassle to manage.
Anyway, models trained in Diffusers are merged into safetensors and uploaded to Civitai, and then returned to HF in Diffusers format, and one day they are merged back into single safetensors...and so on. The ecosystem is real, and if it is easy to interconvert, so much the better.
P.S. Few people know that these scripts exist.
https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_to_original_sdxl.py
P.S.
I am new to generative AI (not more than 6 months), so I don't have a firsthand and accurate understanding of how things have changed, but I have noticed that the number of model makers who put their backups on HF has increased again, probably because the living environment in Civitai has changed recently. (Models (even whole accounts) are being deleted very often, sites are getting server errors very often, the time lag between uploading and censoring/publishing is getting longer, and well, blah blah blah)
Well, I think it would be more convenient for everyone, including me, if it could be easily converted.
P.S.
Unlike the GGUF mentioned above, a real NF4 that is not dev original for other software was uploaded to HF, so I'm listing as a sample for to test from_single_file.
https://huggingface.co./Zuntan/iNiverseFluxV11-8step/blob/main/iNiverseFluxV11-8step-nf4.safetensors # unet (transformer) only in WebUI or ComfyUI format
https://huggingface.co./Zuntan/iNiverseFluxV11-8step/blob/main/iNiverseFluxV11-8step-nf4-AIO.safetensors # all-in-one in WebUI or ComfyUI format
So far, including other sites, the distribution on NF4 and GGUF is not that large as a percentage, and only unet of torch fp8 is the mainstream, and it is often quantized by each person after downloading.
Since GGUF is inferior in latency but superior in the quality of the generated images, it seems that the standard of quantization format among users is not easily settled.
By the way, there is a bug in some versions of Diffusers that aborts when torch fp8 is specified for loading.
The bug is that it is possible to load a file saved with torch fp8, but if you try to extract the file in VRAM or RAM with fp8, it aborts with a message saying something about the storage format. The problem does not occur when extracting fp8 files with bf16.
I could not distinguish whether this bug comes from Diffusers or Transformers.
Sorry if this has already been fixed in latest dev version.π