--- tags: - text-to-image - stable-diffusion license: apache-2.0 language: - en library_name: diffusers --- # EasyRef Model Card
[**Project Page**](https://easyref-gen.github.io/) **|** [**Paper**](https://arxiv.org/pdf/2412.09618) **|** [**Code**](https://github.com/TempleX98/EasyRef) **|** [🤗 **Demo**](https://huggingface.co./spaces/zongzhuofan/EasyRef)
## Introduction EasyRef is capable of modeling the consistent visual elements of various group image references with a single generalist multimodal LLM in a zero-shot setting.
## Demos More visualization examples are available in our [project page](https://easyref-gen.github.io/). ### Style, Identity, and Character Preservation ### Comparison with IP-Adapter ### Compatibility with ControlNet ## Inference We provide the inference code of EasyRef with SDXL in [**easyref_demo**](https://github.com/TempleX98/EasyRef/blob/main/easyref_demo.ipynb). ### Usage Tips - EasyRef performs best when provided with multiple reference images (more than 2). - To ensure better identity preservation, we strongly recommend that users upload multiple square face images, ensuring the face occupies the majority of each image. - Using multimodal prompts (both reference images and non-empty text prompt) can achieve better results. - We set `scale=1.0` by default. Lowering the `scale` value leads to more diverse but less consistent generation results. ## Cite If you find EasyRef useful for your research and applications, please cite us using this BibTeX: ```bibtex @article{easyref, title={EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM}, author={Zong, Zhuofan and Jiang, Dongzhi and Ma, Bingqi and Song, Guanglu and Shao, Hao and Shen, Dazhong and Liu, Yu and Li, Hongsheng}, journal={arXiv preprint arXiv:2412.09618}, year={2024} } ```