Zero-Shot Image Classification
Transformers
Safetensors
siglip
vision
Inference Endpoints

Add image-text-to-text pipeline tag

#2
by nielsr HF staff - opened

This PR updates the model card metadata to use the image-text-to-text pipeline tag. This tag better reflects the model's multimodal capabilities, including image captioning and visual question answering, as demonstrated in the provided examples and described in the paper. This change improves the model's discoverability on the Hub for users seeking vision-language models.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment