FreddeFrallan
commited on
Commit
•
ff85d1d
1
Parent(s):
040db0b
Create README.md
Browse files
README.md
CHANGED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<br />
|
2 |
+
<p align="center">
|
3 |
+
<h1 align="center">Swe-CLIP 500k</h1>
|
4 |
+
|
5 |
+
<p align="center">
|
6 |
+
<a href="https://github.com/FreddeFrallan/Multilingual-CLIP/tree/main/Model%20Cards/Swe-CLIP%20500k">Github Model Card</a>
|
7 |
+
</p>
|
8 |
+
</p>
|
9 |
+
|
10 |
+
|
11 |
+
## Usage
|
12 |
+
To use this model along with the original CLIP vision encoder you need to download the code and additional linear weights from the [Multilingual-CLIP Github](https://github.com/FreddeFrallan/Multilingual-CLIP).
|
13 |
+
Once this is done, you can load and use the model with the following code
|
14 |
+
```python
|
15 |
+
from src import multilingual_clip
|
16 |
+
|
17 |
+
model = multilingual_clip.load_model('Swe-CLIP-500k')
|
18 |
+
embeddings = model(['Älgen är skogens konung!', 'Alla isbjörnar är vänsterhänta'])
|
19 |
+
print(embeddings.shape)
|
20 |
+
# Yields: torch.Size([2, 640])
|
21 |
+
```
|
22 |
+
|
23 |
+
<!-- ABOUT THE PROJECT -->
|
24 |
+
## About
|
25 |
+
A [KB/Bert-Swedish-Cased](https://huggingface.co/KB/bert-base-swedish-cased) tuned to match the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br>
|
26 |
+
|
27 |
+
Training data pairs was generated by sampling 500k sentences from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into Swedish.
|
28 |
+
All translation was done using the [Huggingface Opus Model](https://huggingface.co/Helsinki-NLP/opus-mt-en-sv), which seemingly procudes higher quality translations than relying on the [AWS translate service](https://aws.amazon.com/translate/).
|