Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
DAMO-NLP-SG
/
VL3-SigLIP-NaViT
like
4
Follow
Language Technology Lab at Alibaba DAMO Academy
95
Image Feature Extraction
Transformers
Safetensors
English
videollama3_vision_encoder
feature-extraction
visual-encoder
multi-modal-large-language-model
custom_code
arxiv:
2501.13106
arxiv:
2406.07476
arxiv:
2306.02858
License:
apache-2.0
Model card
Files
Files and versions
Community
3
Train
Use this model
3eb707c
VL3-SigLIP-NaViT
2 contributors
History:
1 commit
ClownRat
initial commit
3eb707c
verified
13 days ago
.gitattributes
Safe
1.52 kB
initial commit
13 days ago