Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
DAMO-NLP-SG
/
VL3-SigLIP-NaViT
like
4
Follow
Language Technology Lab at Alibaba DAMO Academy
95
Image Feature Extraction
Transformers
Safetensors
English
videollama3_vision_encoder
feature-extraction
visual-encoder
multi-modal-large-language-model
custom_code
arxiv:
2501.13106
arxiv:
2406.07476
arxiv:
2306.02858
License:
apache-2.0
Model card
Files
Files and versions
Community
3
Train
Use this model
557050f
VL3-SigLIP-NaViT
2 contributors
History:
11 commits
Cyril666
Update README.md
557050f
verified
10 days ago
.gitattributes
Safe
1.52 kB
initial commit
14 days ago
README.md
Safe
4.57 kB
Update README.md
10 days ago
config.json
Safe
602 Bytes
Upload model
14 days ago
configuration_videollama3_encoder.py
Safe
969 Bytes
Upload model
14 days ago
image_processing_videollama3.py
Safe
21.9 kB
Update image_processing_videollama3.py
10 days ago
model.safetensors
Safe
824 MB
LFS
Upload model
14 days ago
modeling_videollama3_encoder.py
Safe
21.8 kB
Upload model
14 days ago
preprocessor_config.json
Safe
473 Bytes
Upload processor
14 days ago