--- license: cc-by-4.0 datasets: - FreedomIntelligence/ALLaVA-4V pipeline_tag: image-text-to-text library_name: prismcaptioner ---
# PrismCaptioner Model Card **Model details** PrismCaptioners are open-source captioners with LLaVA architecture finetuned on GPT4V-assisted dataset [ALLaVA](https://huggingface.co./datasets/FreedomIntelligence/ALLaVA-4V). We have released [PrismCaptioner-7B](https://huggingface.co./Yuxuan-Qiao/PrismCaptioner-7B) and [PrismCaptioner-2B](https://huggingface.co./Yuxuan-Qiao/PrismCaptioner-7B). PrismCaptioner-2B details - **Vision Backbone:** google/siglip-so400m-patch14-384 - **Language Backbone:** internlm/internlm2-1_8b - **Dataset:** 1x ALLaVA-Caption-[LAION/VFLAN], 2x Evol-Instruct-GPT4-Turbo-143K **Paper and codebase for more information:** [[Paper](https://arxiv.org/abs/2406.14544)] [[Code](https://github.com/SparksJoe/Prism)] **Intended uses** - **Perception Module:** The model can be integrated into [Prism](https://github.com/SparksJoe/Prism) as a perception module to solve vision-language task by utilizing an external LLM. - **Effective Captioner:** The model can produce high-quality captions for given images. **Model usage** Clone the [Prism](https://github.com/SparksJoe/Prism) repo and complete the [preparation](https://github.com/SparksJoe/Prism/tree/main?tab=readme-ov-file#preparation). You can use PrismCaptioners following [usage](https://github.com/SparksJoe/Prism/blob/main/README.md#usage) or demo below. ```python # In the Prism repo folder from decouple import supported_VLM model = supported_VLM['prismcaptioner-2b']() res = model.generate(['assets/case1.png', 'Given the image below, please provide a detailed description of what you see.']) ```