metadata

license: apache-2.0
language:
  - en
datasets:
  - ILSVRC/imagenet-1k

Model Card for Model ID

VIT-MAE-r is a fine-tuned version of MAE for image reconstuction. We release a version fine-tuned from MAE-Large

Model Details

VIT-MAE-r is already converted to hf format and should be able to be used directly by from_pretrained method.

Model Sources

Repository: LM4LV
Paper: LM4LV: A Frozen Large Language Model for Low-level Vision Tasks
source model: MAE-Large

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoImageProcessor, AutoModelForPreTraining
model = AutoModelForPreTraining.from_pretrained("bytetriper/vit-mae-r")

Evaluation

This model achieves a rFID on ImageNet val set of 1.24, evaluated using the standard tensorflow tool provided by Guided-Diffusion

Citation

BibTeX:

@article{zheng2024lm4lv, title={LM4LV: A Frozen Large Language Model for Low-level Vision Tasks}, author={Zheng, Boyang and Gu, Jinjin and Li, Shijun and Dong, Chao}, journal={arXiv preprint arXiv:2405.15734}, year={2024} }

Model Card Authors

Boyang Zheng

Model Card Contact

[email protected]