Spaces:
Build error
Build error
docs: add findings and future work section
Browse files
intro.md
CHANGED
@@ -25,6 +25,20 @@ We present three demos, which each illustrate different use cases of KoCLIP.
|
|
25 |
* *Text to * Image*: This is essentially an image retrieval task. Given a text, the model looks up a database of pre-computed image embeddings to retrive the image that best matches given text.
|
26 |
* *Text to Patch*: This is also a variant of zero-shot image classification. Given a text and an image, the image is partitioned into subsections, and the model ranks them based on their relevance with the text query.
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
---
|
29 |
|
30 |
We thank the teams at Hugging Face and Google for arranging this wonderful oportunity. It has been a busy yet enormously rewarding week for all of us. Hope you enjoy the demo!
|
|
|
25 |
* *Text to * Image*: This is essentially an image retrieval task. Given a text, the model looks up a database of pre-computed image embeddings to retrive the image that best matches given text.
|
26 |
* *Text to Patch*: This is also a variant of zero-shot image classification. Given a text and an image, the image is partitioned into subsections, and the model ranks them based on their relevance with the text query.
|
27 |
|
28 |
+
## Prompting
|
29 |
+
|
30 |
+
We found that KoCLIP performs better when prompting is used to induce zero-shot behavior. Namely, instead of feeding it a single word or short phrase, casting a template such as
|
31 |
+
|
32 |
+
```
|
33 |
+
이것은 {{}} 이다 (This is {{}}.)
|
34 |
+
```
|
35 |
+
|
36 |
+
noticably helped the model. We hypothesize that this is due to the nature of captions in the MSCOCO datset, which are most often full sentences, albeit sometimes short in length.
|
37 |
+
|
38 |
+
## Future Work
|
39 |
+
|
40 |
+
Due to time and resource contraints, we have yet to compare KoCLIP to other open-source baselines, such as [M-CLIP](https://huggingface.co/M-CLIP). We hope to benchmark KoCLIP on various metrics and evaluation datasets to further determine its performance and reliability. In addition, given that prompting is somewhat of a mysterious trick and an active area of ongoing research, we hope to explore ways to take a more scientific approach on prompt engineering.
|
41 |
+
|
42 |
---
|
43 |
|
44 |
We thank the teams at Hugging Face and Google for arranging this wonderful oportunity. It has been a busy yet enormously rewarding week for all of us. Hope you enjoy the demo!
|