Shitao michaelfeil commited on
Commit
b29731d
1 Parent(s): 7774ef4

update readme for onnx files (#15)

Browse files

- update readme for onnx files (0b933b890ac59109ede6090a4b3ef60c800d8128)


Co-authored-by: Michael <[email protected]>

Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -2864,6 +2864,51 @@ sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, di
2864
  print("Sentence embeddings:", sentence_embeddings)
2865
  ```
2866
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2867
  ### Usage for Reranker
2868
 
2869
  Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.
 
2864
  print("Sentence embeddings:", sentence_embeddings)
2865
  ```
2866
 
2867
+ #### Usage of the ONNX files
2868
+
2869
+ ```python
2870
+ from optimum.onnxruntime import ORTModelForFeatureExtraction # type: ignore
2871
+
2872
+ import torch
2873
+ from transformers import AutoModel, AutoTokenizer
2874
+
2875
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-en-v1.5')
2876
+ model = AutoModel.from_pretrained('BAAI/bge-large-en-v1.5', revision="refs/pr/13")
2877
+ model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-large-en-v1.5', revision="refs/pr/13",file_name="onnx/model.onnx")
2878
+
2879
+ # Sentences we want sentence embeddings for
2880
+ sentences = ["样例数据-1", "样例数据-2"]
2881
+
2882
+ # Tokenize sentences
2883
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2884
+ # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
2885
+ # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
2886
+
2887
+ model_output_ort = model_ort(**encoded_input)
2888
+ # Compute token embeddings
2889
+ with torch.no_grad():
2890
+ model_output = model(**encoded_input)
2891
+
2892
+ # model_output and model_output_ort are identical
2893
+
2894
+ ```
2895
+
2896
+ Its also possible to deploy the onnx files with the [infinity_emb](https://github.com/michaelfeil/infinity) pip package.
2897
+ ```python
2898
+ import asyncio
2899
+ from infinity_emb import AsyncEmbeddingEngine, EngineArgs
2900
+
2901
+ sentences = ["Embed this is sentence via Infinity.", "Paris is in France."]
2902
+ engine = AsyncEmbeddingEngine.from_args(
2903
+ EngineArgs(model_name_or_path = "BAAI/bge-large-en-v1.5", device="cpu", engine="optimum" # or engine="torch"
2904
+ ))
2905
+
2906
+ async def main():
2907
+ async with engine:
2908
+ embeddings, usage = await engine.embed(sentences=sentences)
2909
+ asyncio.run(main())
2910
+ ```
2911
+
2912
  ### Usage for Reranker
2913
 
2914
  Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.