File size: 1,029 Bytes
842136c
 
 
2355c93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
842136c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
license: mit
---
# FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

A framework designed to generate semantically rich image captions.

## Resources

- ๐Ÿ’ป **Project Page**: For more details, visit the official [project page](https://rotsteinnoam.github.io/FuseCap/).

- ๐Ÿ“ **Read the Paper**: You can find the paper [here](https://arxiv.org/abs/2305.17718).
    
- ๐Ÿš€ **Demo**: Try out our BLIP-based model [demo](https://huggingface.co./spaces/noamrot/FuseCap) trained using FuseCap, hosted on Huggingface Spaces.

## Upcoming Updates

The official codebase and trained models for this project will be released soon.

## BibTeX

``` Citation
@misc{rotstein2023fusecap,
      title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions}, 
      author={Noam Rotstein and David Bensaid and Shaked Brody and Roy Ganz and Ron Kimmel},
      year={2023},
      eprint={2305.17718},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```