Is it possible to use Karlo's prior with the Stable Diffusion Variations model?

#6
by beyondarmonia - opened

Karlo's prior takes as input text embeddings and then converts them to CLIP image embeddings.

The SD image variations model takes CLIP image embeddings as conditioning to generate an actual image.

Is it possible to combine the two ( effectively replacing the Karlo's denoiser with SD ) like I'm imagining or am I missing something?

Hi @beyondarmonia , I think it's very reasonable to combine the prior (from karlo) with decoder (from SD). I've found that this PR (https://github.com/huggingface/diffusers/issues/1808) seems to implement your idea.

Thanks for the swift reply, @shkim-kb . Should have known someone would have already implemented it. Happy to know it works.

That PR is exactly what I was looking for. Thank you again.

Saehoon Kim, will you be able to eventually make a space for this hybrid model? I would love to try it out!

Sign up or log in to comment