--- license: apache-2.0 datasets: - imagenet-1k - ade20k metrics: - accuracy - mIoU pipeline_tag: image-classification --- # VisionLLaMA-Base-MAE With the Masked Autoencoders' paradigm, VisionLLaMA-Base-MAE model is trained on ImageNet-1k without labels. It manifests substantial improvements over classification tasks (SFT, linear probing) on ImageNet-1K and the segmentation task on ADE20K. | Model | ImageNet Acc (SFT) | ImageNet Acc (Linear Probe) | ADE20K Segmentation | | -- | -- | --| --| | VisionLLaMA-Base-MAE (ep800) |84.0 |69.7 |49.0 | | VisionLLaMA-Base-MAE (ep1600) |84.3 | 71.7| 50.2 | # How to Use Please refer the [Github](https://github.com/Meituan-AutoML/VisionLLaMA) page for usage. # Citation ``` @article{chu2024visionllama, title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks}, author={Chu, Xiangxiang and Su, Jianlin and Zhang, Bo and Shen, Chunhua}, journal={arXiv preprint arXiv:2403.00522}, year={2024} } ```