Image Labels: One-shot image-conditioned object detection
#5
by
godaspeg
- opened
Is it possible to detect objects using images as labels instead of texts? As OwlVIT is based on CLIP Embeddings, I think this should be theoretically possible.
godaspeg
changed discussion title from
Image Labels
to Image Labels: One-shot image-conditioned object detection
Yes, image-guided object detection is supported, see the demo notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/OWLv2/Zero_and_one_shot_object_detection_with_OWLv2.ipynb
godaspeg
changed discussion status to
closed