Marqo
/

marqo-merged-bge-gist-gte-base

Model card Files Files and versions Community

t0b1as91 commited on Aug 28, 2024

Commit

bdc00f6

·

verified ·

1 Parent(s): c049e26

Update README.md

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -9,5 +9,31 @@ This model focuses on retrieval tasks while also performing well on various task
 ##For retrieval tasks
 ```python
 ```

 ##For retrieval tasks
 ```python
+from transformers import AutoTokenizer, AutoModel
+import torch
+# Sentences we want sentence embeddings for
+sentences = ["this is a test sentence", "this is another test sentence"]
+# Prefixing for retrieval tasks
+instruction = "Represent this sentence for searching relevant passages: "
+# Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained('Marqo/Slerp_merged_109M')
+model = AutoModel.from_pretrained('Marqo/Slerp_merged_109M')
+model.eval()
+# Tokenize sentences
+encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+encoded_input_with_prefixing = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
+# Compute token embeddings
+with torch.no_grad():
+    model_output = model(**encoded_input)
+    model_output_with_prefixing = model(**encoded_input_with_prefixing)
+    model_output_avg = (model_output + model_output_with_prefixing) / 2
+    # Perform pooling. In this case, cls pooling.
+    sentence_embeddings = model_output_avg[0][:, 0]
+# normalize embeddings
+sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
+print("Sentence embeddings:", sentence_embeddings)
 ```