File size: 1,545 Bytes
d0762f1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
<br />
<p align="center">
<h1 align="center">Swe-CLIP 2M</h1>
<p align="center">
<a href="https://github.com/FreddeFrallan/Multilingual-CLIP/tree/main/Model%20Cards/Swe-CLIP%202M">Github Model Card</a>
</p>
</p>
## Usage
To use this model along with the original CLIP vision encoder you need to download the code and additional linear weights from the [Multilingual-CLIP Github](https://github.com/FreddeFrallan/Multilingual-CLIP).
Once this is done, you can load and use the model with the following code
```python
from src import multilingual_clip
model = multilingual_clip.load_model('Swe-CLIP-500k')
embeddings = model(['Älgen är skogens konung!', 'Alla isbjörnar är vänsterhänta'])
print(embeddings.shape)
# Yields: torch.Size([2, 640])
```
<!-- ABOUT THE PROJECT -->
## About
A [KB/Bert-Swedish-Cased](https://huggingface.co./KB/bert-base-swedish-cased) tuned to match the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br>
Training data pairs was generated by sampling 2 Million sentences from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into Swedish.
All translation was done using the [Huggingface Opus Model](https://huggingface.co./Helsinki-NLP/opus-mt-en-sv), which seemingly procudes higher quality translations than relying on the [AWS translate service](https://aws.amazon.com/translate/).
|