mtgv's picture
add model card
211026e verified
|
raw
history blame
712 Bytes
metadata
license: apache-2.0
datasets:
  - imagenet-1k
  - ade20k
metrics:
  - accuracy
  - mIoU
pipeline_tag: image-classification

VisionLLaMA-Base-MAE

With the Masked Autoencoders' paradigm, VisionLLaMA-Base-MAE model is trained on ImageNet-1k without labels. It manifests substantial improvements over classification tasks (SFT, linear probing) on ImageNet-1K and the segmentation task on ADE20K.

Model ImageNet Acc (SFT) ImageNet Acc (Linear Probe) ADE20K Segmentation
VisionLLaMA-Base-MAE (ep800) 84.0 69.7 49.0
VisionLLaMA-Base-MAE (ep1600) 84.3 71.7 50.2

How to Use

Please refer the Github page for usage.