--- license: bsd --- # InteractDiffusion Diffuser Implementation [Project Page](https://jiuntian.github.io/interactdiffusion) | [Paper](https://arxiv.org/abs/2312.05849) | [WebUI](https://github.com/jiuntian/sd-webui-interactdiffusion) | [Demo](https://huggingface.co./spaces/interactdiffusion/interactdiffusion) | [Video](https://www.youtube.com/watch?v=Uunzufq8m6Y) | [Diffuser](https://huggingface.co./interactdiffusion/diffusers-v1-2) | [Colab](https://colab.research.google.com/drive/1Bh9PjfTylxI2rbME5mQJtFqNTGvaghJq?usp=sharing) ## How to Use ```python from diffusers import DiffusionPipeline import torch pipeline = DiffusionPipeline.from_pretrained( "interactdiffusion/diffusers-v1-2", trust_remote_code=True, variant="fp16", torch_dtype=torch.float16 ) pipeline = pipeline.to("cuda") images = pipeline( prompt="a person is feeding a cat", interactdiffusion_subject_phrases=["person"], interactdiffusion_object_phrases=["cat"], interactdiffusion_action_phrases=["feeding"], interactdiffusion_subject_boxes=[[0.0332, 0.1660, 0.3359, 0.7305]], interactdiffusion_object_boxes=[[0.2891, 0.4766, 0.6680, 0.7930]], interactdiffusion_scheduled_sampling_beta=1, output_type="pil", num_inference_steps=50, ).images images[0].save('out.jpg') ``` For more information, please check the [project homepage](https://jiuntian.github.io/interactdiffusion/). ## Citation ```bibtex @inproceedings{hoe2023interactdiffusion, title={InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models}, author={Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu}, year={2024}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, } ``` ## Acknowledgement This work is developed based on the codebase of [GLIGEN](https://github.com/gligen/GLIGEN) and [LDM](https://github.com/CompVis/latent-diffusion).