Integrate with Sentence Transformers (+ third parties like LangChain/Haystack/LlamaIndex, etc.)

#1
by tomaarsen HF staff - opened

Hello!

Pull Request overview

  • Integrate with Sentence Transformers
  • Add "how to run" snippet
  • Add transformers and sentence-transformers tags

Details

First of all, congratulations on your model release! Always good to see more large embedding models, and I'm looking forward to the training dataset and recipe.

With this PR I'm proposing to add the configuration files required for Sentence Transformers, as well as the related projects that rely on ST. In particular, this involves adding some configuration files, e.g. ones stating which pooling method to use (last token in this case), what sequence length to use, etc. After adding these, the new snippet in the README becomes a very simple way to use this model.
You can also rely on

task_description = "Given a claim about climate change, retrieve documents that support or refute the claim"
prompt = f'Instruct: {task_description}\nQuery:'

queries = [
    "In Alaska, brown bears are changing their feeding habits to eat elderberries that ripen earlier.",
    "Local and regional sea levels continue to exhibit typical natural variability—in some places rising and in others falling."
]
query_embeddings = model.encode(queries, prompt=prompt)
passage_embeddings = model.encode(passages)

scores = model.similarity(query_embeddings, passage_embeddings)

But I stuck with the current snippet as it resembled the transformers one more closely (i.e. with just one inference call).

  • Tom Aarsen
tomaarsen changed pull request status to open
Zeta Alpha org

Thanks, Tom!

ArthurCamara changed pull request status to merged

Sign up or log in to comment