Having the coodinates be returned

#2
by TotoB12 - opened

Would it be possible to additionaly have the box coordinates be returned with the Text Output? Thanks.

I appologize, I cannot figure out how to push to the branch I made, since this space is in Dev-mode.
Here is what I wanted to add:

Modified line 81

    return image, str(parsed_content_list), str(label_coordinates)

Added line 108

        with gr.Column():
            image_output_component = gr.Image(type='pil', label='Image Output')
            text_output_component = gr.Textbox(label='Parsed screen elements', placeholder='Text Output')
            coordinates_output_component = gr.Textbox(label='Coordinates', placeholder='Coordinates Output') <-- this one

Modified line 125 (previously 124)

        outputs=[image_output_component, text_output_component, coordinates_output_component]

Many thanks

hello @TotoB12 just read this issue - thanks for taking time investigating!

Will this output the coordinates as well?

Hey!
Yes this will display the usual text output (with the text and icon box numbers), with the coordinates in a seperate output box.
I only got to test this on a modified CPU only Space, but I am pretty this this is all that is needed.
Here is what it would look like:

image.png

awesome! is that something the community wants?

The coordinates are one of the core features of this model. As per the current app.py at line 77:

    dino_labeled_img, label_coordinates, parsed_content_list = get_som_labeled_img(
        image_save_path,
        yolo_model,
        BOX_TRESHOLD=box_threshold,
        output_coord_in_ratio=True,
        ocr_bbox=ocr_bbox,
        draw_bbox_config=draw_bbox_config,
        caption_model_processor=caption_model_processor,
        ocr_text=text,
        iou_threshold=iou_threshold
    )

The coordinates are already being generated when the model is prompted, they are just not being shown.
On this Space, seeing the Text Output and labeled image is nice, but is useless for actual use in projects without the full data.
In the microsoft/OmniParser GitHub repository's issues tab, we can see that it is definitely an indespensible asset in the use of the model.

image.png

Sign up or log in to comment