Swe-CLIP 500k

Huggingface Model · Huggingface Base Model

## Usage To use this model along with the original CLIP vision encoder follow the [main page usage instructions](https://github.com/FreddeFrallan/Multilingual-CLIP) to download the additional linear weights. Once this is done, you can load and use the model with the following code ```python from multilingual_clip import multilingual_clip model = multilingual_clip.load_model('Swe-CLIP-500k') embeddings = model(['Älgen är skogens konung!', 'Alla isbjörnar är vänsterhänta']) print(embeddings.shape) # Yields: torch.Size([2, 640]) ``` ## About A [KB/Bert-Swedish-Cased](https://huggingface.co./KB/bert-base-swedish-cased) tuned to match the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder.
Training data pairs was generated by sampling 500k sentences from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into Swedish. All translation was done using the [Huggingface Opus Model](https://huggingface.co./Helsinki-NLP/opus-mt-en-sv), which seemingly procudes higher quality translations than relying on the [AWS translate service](https://aws.amazon.com/translate/).