Zero-Shot Image Classification
Transformers
Safetensors
siglip
vision
Inference Endpoints
nielsr HF staff commited on
Commit
a601b90
·
verified ·
1 Parent(s): a713301

Add image-text-to-text pipeline tag

Browse files

This PR updates the model card metadata to use the `image-text-to-text` pipeline tag. This tag better reflects the model's multimodal capabilities, including image captioning and visual question answering, as demonstrated in the provided examples and described in the paper. This change improves the model's discoverability on the Hub for users seeking vision-language models.

Files changed (1) hide show
  1. README.md +7 -8
README.md CHANGED
@@ -1,19 +1,18 @@
1
  ---
 
2
  license: apache-2.0
 
3
  tags:
4
  - vision
5
  widget:
6
- - src: >-
7
- https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg
8
- candidate_labels: bee in the sky, bee on the flower
9
- example_title: Bee
10
- library_name: transformers
11
- pipeline_tag: zero-shot-image-classification
12
  ---
13
 
14
  # SigLIP 2 Giant
15
 
16
- [SigLIP 2](https://huggingface.co/papers/2502.14786) extends the pretraining objective of
17
  [SigLIP](https://huggingface.co/papers/2303.15343) with prior, independently developed techniques
18
  into a unified recipe, for improved semantic understanding, localization, and dense features.
19
 
@@ -99,4 +98,4 @@ Evaluation of SigLIP 2 is shown below (taken from the paper).
99
  primaryClass={cs.CV},
100
  url={https://arxiv.org/abs/2502.14786},
101
  }
102
- ```
 
1
  ---
2
+ library_name: transformers
3
  license: apache-2.0
4
+ pipeline_tag: image-text-to-text
5
  tags:
6
  - vision
7
  widget:
8
+ - src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg
9
+ candidate_labels: bee in the sky, bee on the flower
10
+ example_title: Bee
 
 
 
11
  ---
12
 
13
  # SigLIP 2 Giant
14
 
15
+ [SigLIP 2](https://hf.co/papers/2502.14786) extends the pretraining objective of
16
  [SigLIP](https://huggingface.co/papers/2303.15343) with prior, independently developed techniques
17
  into a unified recipe, for improved semantic understanding, localization, and dense features.
18
 
 
98
  primaryClass={cs.CV},
99
  url={https://arxiv.org/abs/2502.14786},
100
  }
101
+ ```