Zero-Shot Image Classification
Transformers
Safetensors
siglip
vision
Inference Endpoints

Problem with demo code using pipeline

#2
by Neman - opened

There are several issues with this demo code in Model card:

from transformers import pipeline

# load pipeline
ckpt = "google/siglip2-large-patch16-512"
image_classifier = pipeline(model=ckpt, task="zero-shot-image-classification")

# load image and candidate labels
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
candidate_labels = ["2 cats", "a plane", "a remote"]

# run inference
outputs = image_classifier(image, candidate_labels)
print(outputs)

Some are easy to solve.
image must be loaded from url:

from transformers.image_utils import load_image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = load_image(url)

candidate_labels is not in correct position as parameter so this doesn't work:

outputs = image_classifier(image, candidate_labels)

candidate_labels parameter needs to be passed as a keyword argument:

outputs = image_classifier(image, candidate_labels=candidate_labels)

Now to the problem I didn't found solution

Exception has occurred: ValueError
Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`input_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).
ValueError: expected sequence of length 10 at dim 1 (got 9)

The above exception was the direct cause of the following exception:

  File "/home/neman/PROGRAMMING/PYTHON/MULTIMODAL_TESTS/SigLIP2_test_1.py", line 14, in <module>
    outputs = image_classifier(image, candidate_labels=candidate_labels)
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`input_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

Do candidate_labels need to be passed in other format? Tokenizer?

OK, I thought latest transformer version (4.49.0) has support for SigLIP2. I updated to dev (transformers-4.50.0.dev0) and it works.
Maybe put instruction in model card to pip install git+https://github.com/huggingface/transformers
and correct image loading and keyword argument in demo code.

Beside that, thanks for great new model! :)

Sign up or log in to comment