Support for embedding endpoint
#13
by
ultraxyz
- opened
I deployed the model using vLLM, and use the following code from https://docs.vllm.ai/en/latest/getting_started/examples/openai_embedding_client.html but got 404 error:
from openai import OpenAI
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key=openai_api_key,
base_url=openai_api_base,
)
models = client.models.list()
model = models.data[0].id
responses = client.embeddings.create(input=[
"Hello my name is",
"The best thing about vLLM is that it supports many different models"
], model=model)
for data in responses.data:
print(data.embedding) # list of float of len 4096
Error message:
NotFoundError: Error code: 404 - {'detail': 'Not Found'}
Hi, you asked about the openai Embedding model, you can refer more to the official website of openai for this question. But it looks like you are also interested in our Yi-1.5-34B-Chat model, I can give you an example of reasoning about this model using vllm and you can try it.
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-1.5-34B-Chat")
sampling_params = SamplingParams(
temperature=0.8,
top_p=0.8)
llm = LLM(model="01-ai/Yi-1.5-34B-Chat")
prompt = "Hi!"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
outputs = llm.generate([text], sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
hi ultraxyz,
Yi-1.5 is not an embedding model. An embedding model takes text as input and output is a list of float numbers with a length of 4096, usually belongs to the BERT category and needs to load with sentence-transformers.
lorinma
changed discussion status to
closed