Question about Image Preprocessing

#13
by alghisius - opened

I have a question about the image processing part: by reading the code it seems like the first preprocess step subdivides the image into different crops, while the last part simply resizes the image (sort of a general representation). Basically, in the simplest case of a 336x336x3, the same representation is appended twice.

This stack of crops is then passed to the ViT and processed individually, right?

Thank you for your reply.

@alghisius Yes, the image is divided into multiple patches of 336x336 pixels. Details of the cropping strategy are available in the preprocessing section of the Hugging Face model code, with further information to be provided in an upcoming paper(to be released by the end of November).

Sign up or log in to comment