llama cpp

#31
by goodasdgood - opened

not run with llama cpp

!./llama-simple -m /content/flux1-dev-Q8_0.gguf -p "Hello my name is"

llama_model_loader: loaded meta data with 3 key-value pairs and 780 tensors from /content/flux1-dev-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = flux
llama_model_loader: - kv 1: general.quantization_version u32 = 2
llama_model_loader: - kv 2: general.file_type u32 = 8
llama_model_loader: - type f16: 476 tensors
llama_model_loader: - type q8_0: 304 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'flux'
llama_load_model_from_file: failed to load model
main: error: unable to load model

How was the conversion done gguf?

I found another library that runs it.
ComfyUI-GGUF

You can try stable-diffusion.cpp instead which where stable diffusion based models is implemented for the GGML library.

How was the conversion done gguf?

I believe you can quantize in stable-diffusion.cpp and ComfyUI-GGUF as well.

ComfyUI GGUF is the repository used to create this, it's linked in the readme as well. As @ecyht2 mentions above, both that nodepack and stable-diffusion cpp support both running and quantization. You can also use sd-forge to run the models.

city96 changed discussion status to closed

Sign up or log in to comment