metadata
license: mit
FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions
A framework designed to generate semantically rich image captions.
Resources
๐ป Project Page: For more details, visit the official project page.
๐ Read the Paper: You can find the paper here.
๐ Demo: Try out our BLIP-based model demo trained using FuseCap, hosted on Huggingface Spaces.
Upcoming Updates
The official codebase and trained models for this project will be released soon.
BibTeX
@misc{rotstein2023fusecap,
title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions},
author={Noam Rotstein and David Bensaid and Shaked Brody and Roy Ganz and Ron Kimmel},
year={2023},
eprint={2305.17718},
archivePrefix={arXiv},
primaryClass={cs.CV}
}