--- license: mit --- # FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions A framework designed to generate semantically rich image captions. ## Resources - 💻 **Project Page**: For more details, visit the official [project page](https://rotsteinnoam.github.io/FuseCap/). - 📝 **Read the Paper**: You can find the paper [here](https://arxiv.org/abs/2305.17718). - 🚀 **Demo**: Try out our BLIP-based model [demo](https://huggingface.co./spaces/noamrot/FuseCap) trained using FuseCap, hosted on Huggingface Spaces. ## Upcoming Updates The official codebase and trained models for this project will be released soon. ## BibTeX ``` Citation @misc{rotstein2023fusecap, title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions}, author={Noam Rotstein and David Bensaid and Shaked Brody and Roy Ganz and Ron Kimmel}, year={2023}, eprint={2305.17718}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```