Pavel Iakubovskii

qubvel-hf

AI & ML interests

Computer Vision models

Recent Activity

Organizations

Hugging Face's profile picture PyTorch Image Models's profile picture Peking University's profile picture Hugging Face Internal Testing Organization's profile picture Huggingface Projects's profile picture Hugging Face OSS Metrics's profile picture Hugging Face for Computer Vision's profile picture kotol's profile picture yorg's profile picture CVPR2024's profile picture Hugging Face Discord Community's profile picture nltpt's profile picture s0409's profile picture Segmentation Models Pytorch's profile picture smp-test's profile picture University of Sydney's profile picture s0225's profile picture ETH Zurich - Computer Vision and Geometry Lab's profile picture

qubvel-hf's activity

view reply

btw, also observed "." and capitalized template influences the confidence quite a bit

view reply

Not sure what's up as I'm not familiar with this codebase (and no time to dig in), but for siglip what you're supposed to do is do sigmoid(zimg @ ztxt * temperature + bias)

from what you describe, I would bet the bias and/or temperature are missing?
The ground-truth reference code is https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP2_demo.ipynb

Hey @giffmana , temperature and bias are applied under the hood, see

Siglip
https://github.com/huggingface/transformers/blob/17792556b21b4da0dbb9e4b59b39fb34aae4047c/src/transformers/models/siglip/modeling_siglip.py#L1411-L1417

Siglip2
https://github.com/huggingface/transformers/blob/17792556b21b4da0dbb9e4b59b39fb34aae4047c/src/transformers/models/siglip2/modeling_siglip2.py#L1459-L1465

upvoted an article 3 days ago
view article
Article

FastRTC: The Real-Time Communication Library for Python

β€’ 97
New activity in google/siglip2-base-patch16-224 5 days ago

Missing Vocab file

2
#4 opened 5 days ago by
myn0908
upvoted an article 7 days ago
view article
Article

SigLIP 2: A better multilingual vision language encoder

β€’ 113
published an article 8 days ago
view article
Article

SigLIP 2: A better multilingual vision language encoder

β€’ 113