M-BERT Base 69

Huggingface Model · Huggingface Base Model

## Usage To use this model along with the original CLIP vision encoder follow the [main page usage instructions](https://github.com/FreddeFrallan/Multilingual-CLIP) to download the additional linear weights. Once this is done, you can load and use the model with the following code ```python from multilingual_clip import multilingual_clip model = multilingual_clip.load_model('M-BERT-Base-69') embeddings = model(['Älgen är skogens konung!', 'Wie leben Eisbären in der Antarktis?', 'Вы знали, что все белые медведи левши?']) print(embeddings.shape) # Yields: torch.Size([3, 640]) ``` ## About A [bert-base-multilingual](https://huggingface.co./bert-base-multilingual-cased) tuned to match the embedding space for 69 languages, to the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder.
A full list of the 100 languages used during pre-training can be found [here](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages), and a list of the 69 languages used during fine-tuning can be found in [SupportedLanguages.md](Fine-Tune-Languages.md). Training data pairs was generated by sampling 40k sentences for each language from the combined descriptions of [GCC](https://ai.google.com/research/ConceptualCaptions/) + [MSCOCO](https://cocodataset.org/#home) + [VizWiz](https://vizwiz.org/tasks-and-datasets/image-captioning/), and translating them into the corresponding language. All translation was done using the [AWS translate service](https://aws.amazon.com/translate/), the quality of these translations have currently not been analyzed, but one can assume the quality varies between the 40 languages.