noamrot's picture
yaml fix
842136c
|
raw
history blame
1.03 kB
metadata
license: mit

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

A framework designed to generate semantically rich image captions.

Resources

  • ๐Ÿ’ป Project Page: For more details, visit the official project page.

  • ๐Ÿ“ Read the Paper: You can find the paper here.

  • ๐Ÿš€ Demo: Try out our BLIP-based model demo trained using FuseCap, hosted on Huggingface Spaces.

Upcoming Updates

The official codebase and trained models for this project will be released soon.

BibTeX

@misc{rotstein2023fusecap,
      title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions}, 
      author={Noam Rotstein and David Bensaid and Shaked Brody and Roy Ganz and Ron Kimmel},
      year={2023},
      eprint={2305.17718},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}