metadata
license: apache-2.0
language:
- en
datasets:
- ILSVRC/imagenet-1k
Model Card for Model ID
VIT-MAE-r is a fine-tuned version of MAE for image reconstuction. We release a version fine-tuned from MAE-Large
Model Details
VIT-MAE-r is already converted to hf format and should be able to be used directly by from_pretrained
method.
Model Sources
- Repository: LM4LV
- Paper: LM4LV: A Frozen Large Language Model for Low-level Vision Tasks
- source model: MAE-Large
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoImageProcessor, AutoModelForPreTraining
model = AutoModelForPreTraining.from_pretrained("bytetriper/vit-mae-r")
Evaluation
This model achieves a rFID on ImageNet val set of 1.24, evaluated using the standard tensorflow tool provided by Guided-Diffusion
Citation
BibTeX:
@article{zheng2024lm4lv, title={LM4LV: A Frozen Large Language Model for Low-level Vision Tasks}, author={Zheng, Boyang and Gu, Jinjin and Li, Shijun and Dong, Chao}, journal={arXiv preprint arXiv:2405.15734}, year={2024} }
Model Card Authors
Boyang Zheng