mtgv
/

VisionLLaMA-Base-MAE

Image Classification

Model card Files Files and versions Community

VisionLLaMA-Base-MAE / README.md

mtgv's picture

Update README.md

6dc0a2f verified 8 months ago

|

965 Bytes

	---
	license: apache-2.0
	datasets:
	- imagenet-1k
	- ade20k
	metrics:
	- accuracy
	- mIoU
	pipeline_tag: image-classification
	---

	# VisionLLaMA-Base-MAE

	With the Masked Autoencoders' paradigm, VisionLLaMA-Base-MAE model is trained on ImageNet-1k without labels. It manifests substantial improvements over classification tasks (SFT, linear probing) on ImageNet-1K and the segmentation task on ADE20K.

	\| Model \| ImageNet Acc (SFT) \| ImageNet Acc (Linear Probe) \| ADE20K Segmentation \|
	\| -- \| -- \| --\| --\|
	\| VisionLLaMA-Base-MAE (ep800) \|84.0 \|69.7 \|49.0 \|
	\| VisionLLaMA-Base-MAE (ep1600) \|84.3 \| 71.7\| 50.2 \|


	# How to Use

	Please refer the [Github](https://github.com/Meituan-AutoML/VisionLLaMA) page for usage.

	# Citation

	```
	@article{chu2024visionllama,
	title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks},
	author={Chu, Xiangxiang and Su, Jianlin and Zhang, Bo and Shen, Chunhua},
	journal={arXiv preprint arXiv:2403.00522},
	year={2024}
	}
	```