inference
#1
by
veeragoni
- opened
how do i run this?
does following convension work? (i get empty lines)
llm = Llama(model_path="./ggml-model-q4_0.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
output = llm.create_chat_completion(
messages = [
{"role": "system", "content": "You are a helpful assistant that answers user's questions."},
*chat_history,
{
"role": "user",
"content": f"cats and dogs playing"
}
]
)
assistant_response = output["choices"][0]["message"]["content"]
print(output["choices"][0]["message"])