Spaces:
Running
on
Zero
Having the coodinates be returned
Would it be possible to additionaly have the box coordinates be returned with the Text Output? Thanks.
I appologize, I cannot figure out how to push to the branch I made, since this space is in Dev-mode.
Here is what I wanted to add:
Modified line 81
return image, str(parsed_content_list), str(label_coordinates)
Added line 108
with gr.Column():
image_output_component = gr.Image(type='pil', label='Image Output')
text_output_component = gr.Textbox(label='Parsed screen elements', placeholder='Text Output')
coordinates_output_component = gr.Textbox(label='Coordinates', placeholder='Coordinates Output') <-- this one
Modified line 125 (previously 124)
outputs=[image_output_component, text_output_component, coordinates_output_component]
Many thanks
hello @TotoB12 just read this issue - thanks for taking time investigating!
Will this output the coordinates as well?
awesome! is that something the community wants?
The coordinates are one of the core features of this model. As per the current app.py
at line 77:
dino_labeled_img, label_coordinates, parsed_content_list = get_som_labeled_img(
image_save_path,
yolo_model,
BOX_TRESHOLD=box_threshold,
output_coord_in_ratio=True,
ocr_bbox=ocr_bbox,
draw_bbox_config=draw_bbox_config,
caption_model_processor=caption_model_processor,
ocr_text=text,
iou_threshold=iou_threshold
)
The coordinates are already being generated when the model is prompted, they are just not being shown.
On this Space, seeing the Text Output and labeled image is nice, but is useless for actual use in projects without the full data.
In the microsoft/OmniParser
GitHub repository's issues tab, we can see that it is definitely an indespensible asset in the use of the model.