Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'.
-
jienengchen/ViTamin-XL-384px
Feature Extraction • Updated • 140 • 17 -
jienengchen/ViTamin-L-336px
Feature Extraction • Updated • 15 • 5 -
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Paper • 2404.02132 • Published • 2 -
jienengchen/ViTamin-XL-336px
Feature Extraction • Updated • 5