Add image-text-to-text pipeline tag

This PR updates the model card metadata to use the `image-text-to-text` pipeline tag. This tag better reflects the model's multimodal capabilities, including image captioning and visual question answering, as demonstrated in the provided examples and described in the paper. This change improves the model's discoverability on the Hub for users seeking vision-language models.

Files changed (1) hide show

README.md +7 -8

README.md CHANGED Viewed

@@ -1,19 +1,18 @@
 ---
 license: apache-2.0
 tags:
 - vision
 widget:
-  - src: >-
-      https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg
-    candidate_labels: bee in the sky, bee on the flower
-    example_title: Bee
-library_name: transformers
-pipeline_tag: zero-shot-image-classification
 ---
 # SigLIP 2 Giant
-[SigLIP 2](https://huggingface.co/papers/2502.14786) extends the pretraining objective of
 [SigLIP](https://huggingface.co/papers/2303.15343) with prior, independently developed techniques
 into a unified recipe, for improved semantic understanding, localization, and dense features.
@@ -99,4 +98,4 @@ Evaluation of SigLIP 2 is shown below (taken from the paper).
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/2502.14786},
 }
-```

 ---
+library_name: transformers
 license: apache-2.0
+pipeline_tag: image-text-to-text
 tags:
 - vision
 widget:
+- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg
+  candidate_labels: bee in the sky, bee on the flower
+  example_title: Bee
 ---
 # SigLIP 2 Giant
+[SigLIP 2](https://hf.co/papers/2502.14786) extends the pretraining objective of
 [SigLIP](https://huggingface.co/papers/2303.15343) with prior, independently developed techniques
 into a unified recipe, for improved semantic understanding, localization, and dense features.
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/2502.14786},
 }
+```