Spaces:
Running
Running
updating the readme
Browse files- introduction.md +14 -2
introduction.md
CHANGED
@@ -70,7 +70,7 @@ Our implementation is available online [here](https://github.com/clip-italian/cl
|
|
70 |
### Backbone Freezing
|
71 |
|
72 |
The ViT used by OpenAI was already trained on 400million images and it is the element in our architecture that probably required less training.
|
73 |
-
The same is true for the BERT model we use. To allow the randomly initialized Re-projection Layers to warm up without messing with the tuned weights of the backbones we decided to do a first training with the backbones of our architecture completely frozen. Only after these layers converged
|
74 |
|
75 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/clip-italian.png" alt="drawing" width="50%"/>
|
76 |
|
@@ -139,21 +139,33 @@ then there is its (partial) counting ability and finally the ability of understa
|
|
139 |
Look at the following - slightly cherry picked (but not even that much) - examples:
|
140 |
|
141 |
### Colors
|
|
|
142 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_giallo.png" alt="drawing" width="600"/>
|
|
|
|
|
143 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_blu.png" alt="drawing" width="600"/>
|
144 |
|
145 |
### Counting
|
|
|
146 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto.png" alt="drawing" width="600"/>
|
|
|
|
|
147 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_gatti.png" alt="drawing" width="600"/>
|
148 |
|
149 |
### Complex Queries
|
|
|
150 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_cavalli_marroni.png" alt="drawing" width="600"/>
|
|
|
151 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_su_sedia.png" alt="drawing" width="600"/>
|
152 |
|
153 |
|
154 |
# Broader Outlook
|
155 |
|
156 |
-
We believe that this model can be useful for many different applications
|
|
|
|
|
|
|
|
|
157 |
of photos in digital format. For example, the [Istituto Luce Cinecittà](https://it.wikipedia.org/wiki/Istituto_Luce_Cinecitt%C3%A0) is an Italian governative entity that collects photos of Italy since the
|
158 |
early 1900 and it is part of the largest movie studios in Europe (Cinecittà).
|
159 |
|
|
|
70 |
### Backbone Freezing
|
71 |
|
72 |
The ViT used by OpenAI was already trained on 400million images and it is the element in our architecture that probably required less training.
|
73 |
+
The same is true for the BERT model we use. To allow the randomly initialized Re-projection Layers to warm up without messing with the tuned weights of the backbones we decided to do a first training with the backbones of our architecture completely frozen. Only after these layers converged we unfreezed the rest of the model to fine-tune all the components. This technique allowed us to reach a much better validation loss.
|
74 |
|
75 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/clip-italian.png" alt="drawing" width="50%"/>
|
76 |
|
|
|
139 |
Look at the following - slightly cherry picked (but not even that much) - examples:
|
140 |
|
141 |
### Colors
|
142 |
+
Here's a blu flower
|
143 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_giallo.png" alt="drawing" width="600"/>
|
144 |
+
|
145 |
+
And here's a yellow flower
|
146 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_blu.png" alt="drawing" width="600"/>
|
147 |
|
148 |
### Counting
|
149 |
+
What about "one cat"
|
150 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto.png" alt="drawing" width="600"/>
|
151 |
+
|
152 |
+
And what about "two cats"?
|
153 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_gatti.png" alt="drawing" width="600"/>
|
154 |
|
155 |
### Complex Queries
|
156 |
+
Have you ever seen "two brown horses"?
|
157 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_cavalli_marroni.png" alt="drawing" width="600"/>
|
158 |
+
And finally, here's a very nice "cat on a chair"
|
159 |
<img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_su_sedia.png" alt="drawing" width="600"/>
|
160 |
|
161 |
|
162 |
# Broader Outlook
|
163 |
|
164 |
+
We believe that this model can be useful for many different applications. From image classification
|
165 |
+
to clustering, a model like CLIP Italian can be used to support researchers and practitioners in many different tasks.
|
166 |
+
Indeed, not only it can be useful in research, but also in industry. A very interesting use-case is given by ecommerce platforms:
|
167 |
+
these website often deal with a main source of text that is the query engine and with lots of images of the products. CLIP Italian
|
168 |
+
can be a killer app in this context, providing a way to search for images and text. Nonetheless, Italy has many different collections
|
169 |
of photos in digital format. For example, the [Istituto Luce Cinecittà](https://it.wikipedia.org/wiki/Istituto_Luce_Cinecitt%C3%A0) is an Italian governative entity that collects photos of Italy since the
|
170 |
early 1900 and it is part of the largest movie studios in Europe (Cinecittà).
|
171 |
|