LaBSE for French and German
This is a shortened version of sentence-transformers/LaBSE. The model was prepaired with the direct help of cointegrated, the author of the LaBSE-en-ru model.
The current model includes only French and German tokens, and the vocabulary is thus 10% of the original while number of parameters in the whole model is 27% of the original.
To get the sentence embeddings, you can use the following code:
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("EIStakovskii/LaBSE-fr-de")
model = AutoModel.from_pretrained("EIStakovskii/LaBSE-fr-de")
sentences = ["Wie geht es dir?", "Comment vas-tu?"]
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = model_output.pooler_output
embeddings = torch.nn.functional.normalize(embeddings)
print(embeddings)
Reference:
Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Narveen Ari, Wei Wang. Language-agnostic BERT Sentence Embedding. July 2020
License: https://tfhub.dev/google/LaBSE/1
- Downloads last month
- 112
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.