SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 10 items • Updated 29 days ago • 50
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 29 days ago • 136