--- license: apache-2.0 --- This model is a merged model, using [bge-small-en-v1.5](https://huggingface.co./BAAI/bge-small-en-v1.5), [GIST-Embedding-v0](https://huggingface.co./avsolatorio/GIST-Embedding-v0) and [gte-base](https://huggingface.co./thenlper/gte-base). This model focuses on retrieval tasks while also performing well on various tasks (See experiment details below). ##Usage ##For retrieval tasks ```python from transformers import AutoTokenizer, AutoModel import torch # Sentences we want sentence embeddings for sentences = ["this is a test sentence", "this is another test sentence"] # Prefixing for retrieval tasks instruction = "Represent this sentence for searching relevant passages: " # Load model from HuggingFace Hub tokenizer = AutoTokenizer.from_pretrained('Marqo/Slerp_merged_109M') model = AutoModel.from_pretrained('Marqo/Slerp_merged_109M') model.eval() # Tokenize sentences encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt') encoded_input_with_prefixing = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt') # Compute token embeddings with torch.no_grad(): model_output = model(**encoded_input) model_output_with_prefixing = model(**encoded_input_with_prefixing) model_output_avg = (model_output + model_output_with_prefixing) / 2 # Perform pooling. In this case, cls pooling. sentence_embeddings = model_output_avg[0][:, 0] # normalize embeddings sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1) print("Sentence embeddings:", sentence_embeddings) ```