File size: 1,029 Bytes
842136c 2355c93 842136c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
license: mit
---
# FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions
A framework designed to generate semantically rich image captions.
## Resources
- ๐ป **Project Page**: For more details, visit the official [project page](https://rotsteinnoam.github.io/FuseCap/).
- ๐ **Read the Paper**: You can find the paper [here](https://arxiv.org/abs/2305.17718).
- ๐ **Demo**: Try out our BLIP-based model [demo](https://huggingface.co./spaces/noamrot/FuseCap) trained using FuseCap, hosted on Huggingface Spaces.
## Upcoming Updates
The official codebase and trained models for this project will be released soon.
## BibTeX
``` Citation
@misc{rotstein2023fusecap,
title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions},
author={Noam Rotstein and David Bensaid and Shaked Brody and Roy Ganz and Ron Kimmel},
year={2023},
eprint={2305.17718},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |