<br /> | |
<p align="center"> | |
<h1 align="center">Swe-CLIP 2M</h1> | |
<p align="center"> | |
<a href="https://huggingface.co./M-CLIP/Swedish-2M">Huggingface Model</a> | |
· | |
<a href="https://huggingface.co./KB/bert-base-swedish-cased">Huggingface Base Model</a> | |
</p> | |
</p> | |
## Usage | |
To use this model along with the original CLIP vision encoder follow the [main page usage instructions](https://github.com/FreddeFrallan/Multilingual-CLIP) to download the additional linear weights. | |
Once this is done, you can load and use the model with the following code | |
```python | |
from multilingual_clip import multilingual_clip | |
model = multilingual_clip.load_model('Swe-CLIP-2M') | |
embeddings = model(['Älgen är skogens konung!', 'Alla isbjörnar är vänsterhänta']) | |
print(embeddings.shape) | |
# Yields: torch.Size([2, 640]) | |
``` | |
<!-- ABOUT THE PROJECT --> | |
## About | |
A [KB/Bert-Swedish-Cased](https://huggingface.co./KB/bert-base-swedish-cased) tuned to match the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br> | |
Training data pairs was generated by sampling 2 Million sentences from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into Swedish. | |
All translation was done using the [Huggingface Opus Model](https://huggingface.co./Helsinki-NLP/opus-mt-en-sv), which seemingly procudes higher quality translations than relying on the [AWS translate service](https://aws.amazon.com/translate/). | |