Zero-Shot Image Classification
Transformers
Safetensors
siglip
vision
Inference Endpoints

AutoModel.from_pretrained error in loading state_dict

#3
by Srymaker - opened

why is this? I have tried updating transformers to the latest version.
image.png

image.png

same problem

I meet the same error. It seems that the text prediction head (weights and bias) shape in current transformers is [1152, 1152] while the weights the authors provided are [1536, 1152] to match the visual token output.

image.png
Bugs are here in current transformer source code.

Sign up or log in to comment