btw, also observed "." and capitalized template influences the confidence quite a bit
Pavel Iakubovskii
qubvel-hf
AI & ML interests
Computer Vision models
Recent Activity
commented on
their
article
1 day ago
SigLIP 2: A better multilingual vision language encoder
commented on
their
article
1 day ago
SigLIP 2: A better multilingual vision language encoder
liked
a Space
1 day ago
ariG23498/phi4-multimodal
Organizations
qubvel-hf's activity

commented on
SigLIP 2: A better multilingual vision language encoder
1 day ago

commented on
SigLIP 2: A better multilingual vision language encoder
1 day ago
Not sure what's up as I'm not familiar with this codebase (and no time to dig in), but for siglip what you're supposed to do is do sigmoid(zimg @ ztxt * temperature + bias)
from what you describe, I would bet the bias and/or temperature are missing?
The ground-truth reference code is https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP2_demo.ipynb
Hey @giffmana , temperature and bias are applied under the hood, see
Error while loading processor: TypeError: expected str, bytes or os.PathLike object, not NoneType
7
#2 opened 7 days ago
by
armamut
question about 'model_type' in config.json
1
#5 opened 3 days ago
by
XA-hyy

upvoted
an
article
3 days ago
Article
FastRTC: The Real-Time Communication Library for Python
β’
97
Missing Vocab file
2
#4 opened 5 days ago
by
myn0908

upvoted
an
article
7 days ago
Article
SigLIP 2: A better multilingual vision language encoder
β’
113

upvoted
a
paper
7 days ago

upvoted
a
collection
7 days ago

published
an
article
8 days ago
Article
SigLIP 2: A better multilingual vision language encoder
β’
113