How to postprocess the coordinates coming from "Point to <something>"

by hadim - opened Sep 25, 2024

Sep 25, 2024

The output is an xml string similar to <points x1="9.9" y1="83.8" x2="20.9" y2="2.7" alt="something">something</points> but when trying to plot those coordinates to the original input image, the scaling and the position seems to be very off.

I tried scaling up using the default model image size (336, 336) but it does not work. Any idea?

Muennighoff

Ai2 org Sep 25, 2024

You need to scale using the original image size of your image input

hadim

Sep 25, 2024

So the original image size is (3008, 2000) and the model default input image size is (336, 336) (according to config.vision_backbone["image_default_input_size"]).

So I tried

x_factor = image.size[0] / input_size[0]  # ~8.95
y_factor = image.size[1] / input_size[1]  # ~5.95

but the scaling factors are still too small. I found manually that the correct ones are x_factor=30 and y_factor=19.5.

Am I missing something? Can you provide a snippet that compute the scaling factor?

Muennighoff

Ai2 org Sep 25, 2024

Maybe @sanghol can chime in here?

sanghol

Ai2 org Sep 25, 2024

Hi, our model generates pointing outputs to be easily rendered on images in HTML, e.g. in the format of <div class="dot" style="left: {x}%; top: {y}%;"></div>.
You need to divide x and y coordinates by 100 before multiplying by image width and height.
Thus, the actual location of point would be (x1, y1) = (297.792, 1676), (x2, y2) = (628.672, 54) ( I assumed that w=3008 and h=2000).

hadim

Sep 25, 2024

Thanks that works like a charm (maybe you should document that somewhere!).

hadim changed discussion status to closed Sep 25, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment