Question about Image Preprocessing

#13

by alghisius - opened 12 days ago

12 days ago

I have a question about the image processing part: by reading the code it seems like the first preprocess step subdivides the image into different crops, while the last part simply resizes the image (sort of a general representation). Basically, in the simplest case of a 336x336x3, the same representation is appended twice.

This stack of crops is then passed to the ViT and processed individually, right?

Thank you for your reply.

amanrangapur

Ai2 org 12 days ago

@alghisius Yes, the image is divided into multiple patches of 336x336 pixels. Details of the cropping strategy are available in the preprocessing section of the Hugging Face model code, with further information to be provided in an upcoming paper(to be released by the end of November).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment