Spaces:

clip-italian
/

clip-italian-demo

Running

App Files Files Community

vinid commited on Jul 19, 2021

Commit

fc8611a

1 Parent(s): a5b18fc

updating the readme

Browse files

Files changed (1) hide show

introduction.md +14 -2

introduction.md CHANGED Viewed

@@ -70,7 +70,7 @@ Our implementation is available online [here](https://github.com/clip-italian/cl
 ### Backbone Freezing
 The ViT used by OpenAI was already trained on 400million images and it is the element in our architecture that probably required less training.
-The same is true for the BERT model we use. To allow the randomly initialized Re-projection Layers to warm up without messing with the tuned weights of the backbones we decided to do a first training with the backbones of our architecture completely frozen. Only after these layers converged did we unfreeze the rest of the model to fine-tune all the components. This technique allowed us to reach a much better validation loss.
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/clip-italian.png" alt="drawing" width="50%"/>
@@ -139,21 +139,33 @@ then there is its (partial) counting ability and finally the ability of understa
 Look at the following - slightly cherry picked (but not even that much) - examples:
 ### Colors
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_giallo.png" alt="drawing" width="600"/>
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_blu.png" alt="drawing" width="600"/>
 ### Counting
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto.png" alt="drawing" width="600"/>
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_gatti.png" alt="drawing" width="600"/>
 ### Complex Queries
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_cavalli_marroni.png" alt="drawing" width="600"/>
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_su_sedia.png" alt="drawing" width="600"/>
 # Broader Outlook
-We believe that this model can be useful for many different applications, not only in research settings. Italy has many different collections
 of photos in digital format. For example, the [Istituto Luce Cinecittà](https://it.wikipedia.org/wiki/Istituto_Luce_Cinecitt%C3%A0) is an Italian governative entity that collects photos of Italy since the
 early 1900 and it is part of the largest movie studios in Europe (Cinecittà).

 ### Backbone Freezing
 The ViT used by OpenAI was already trained on 400million images and it is the element in our architecture that probably required less training.
+The same is true for the BERT model we use. To allow the randomly initialized Re-projection Layers to warm up without messing with the tuned weights of the backbones we decided to do a first training with the backbones of our architecture completely frozen. Only after these layers converged we unfreezed the rest of the model to fine-tune all the components. This technique allowed us to reach a much better validation loss.
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/clip-italian.png" alt="drawing" width="50%"/>
 Look at the following - slightly cherry picked (but not even that much) - examples:
 ### Colors
+Here's a blu flower
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_giallo.png" alt="drawing" width="600"/>
+And here's a yellow flower
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/fiore_blu.png" alt="drawing" width="600"/>
 ### Counting
+What about "one cat"
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto.png" alt="drawing" width="600"/>
+And what about "two cats"?
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_gatti.png" alt="drawing" width="600"/>
 ### Complex Queries
+Have you ever seen "two brown horses"?
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/due_cavalli_marroni.png" alt="drawing" width="600"/>
+And finally, here's a very nice "cat on a chair"
 <img src="https://huggingface.co/spaces/clip-italian/clip-italian-demo/raw/main/static/img/gatto_su_sedia.png" alt="drawing" width="600"/>
 # Broader Outlook
+We believe that this model can be useful for many different applications. From image classification
+to clustering, a model like CLIP Italian can be used to support researchers and practitioners in many different tasks.
+Indeed, not only it can be useful in research, but also in industry. A very interesting use-case is given by ecommerce platforms:
+these website often deal with a main source of text that is the query engine and with lots of images of the products. CLIP Italian
+can be a killer app in this context, providing a way to search for images and text. Nonetheless, Italy has many different collections
 of photos in digital format. For example, the [Istituto Luce Cinecittà](https://it.wikipedia.org/wiki/Istituto_Luce_Cinecitt%C3%A0) is an Italian governative entity that collects photos of Italy since the
 early 1900 and it is part of the largest movie studios in Europe (Cinecittà).