HuggingFace Newbie - What input does this model expect?
I am fairly new to HuggingFace and deploying models for inference. I am using beam.cloud to deploy this model but I'm not sure how to actually use it. When I send a list of messages, the response contains an output
field with what looks like gibberish text completion.
What type of input does this model expect? Is there some where to see that on HuggingFace?
How is this model intended to be used? Should I be constantly sending partial transcripts until it tells me the turn is over?
Thanks!
Here's a quick example of trying to use the model with the transformers library. What task should I use? In the example it says text-generation
but that doesn't give me understandable results:
from transformers import pipeline, Pipeline
messages = [
{"role": "user", "content": "Who are you?"},
{"role": "assistant", "content": "I am me."},
{"role": "user", "content": "But who are you really?"},
{"role": "assistant", "content": "I am me."},
{"role": "user", "content": "But who does"}
]
pipe: Pipeline = pipeline("text-generation", model="livekit/turn-detector")
result = pipe(messages)
print(result)
Outputs:
[{'generated_text': [{'role': 'user', 'content': 'Who are you?'}, {'role': 'assistant', 'content': 'I am me.'}, {'role': 'user', 'content': 'But who are you really?'}, {'role': 'assistant', 'content': 'I am me.'}, {'role': 'user', 'content': 'But who does'}, {'role': 'assistant', 'content': 'youwhatwhatwhatwhat'}]}]
This seems to work better:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("livekit/turn-detector")
messages = [
{"role": "user", "content": "Who are you?"},
{"role": "assistant", "content": "I am John."},
{"role": "user", "content": "What is your last name?"},
{"role": "assistant", "content": "Smith."},
{"role": "user", "content": "How do you spell the first"}
]
# Format messages using the chat template
text = tokenizer.apply_chat_template(
messages,
add_generation_prompt=False,
add_special_tokens=False,
tokenize=False
)
# Remove the EOU token from current utterance
ix = text.rfind("<|im_end|>")
text = text[:ix]
# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Load model
model = AutoModelForSequenceClassification.from_pretrained(
"livekit/turn-detector")
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
print("probabilities", probabilities)
# Use index 1 for the positive class probability
eou_probability = probabilities[0, 1].item()
print(f"End of utterance probability: {eou_probability}")
It outputs something like this:
probabilities tensor([[0.9695, 0.0305]])
End of utterance probability: 0.030476752668619156
I imagine the first value in the tensor is the probability the speech will continue and the second value is the probability the speech is finished.
This seems to work better:
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("livekit/turn-detector") messages = [ {"role": "user", "content": "Who are you?"}, {"role": "assistant", "content": "I am John."}, {"role": "user", "content": "What is your last name?"}, {"role": "assistant", "content": "Smith."}, {"role": "user", "content": "How do you spell the first"} ] # Format messages using the chat template text = tokenizer.apply_chat_template( messages, add_generation_prompt=False, add_special_tokens=False, tokenize=False ) # Remove the EOU token from current utterance ix = text.rfind("<|im_end|>") text = text[:ix] # Tokenize inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) # Load model model = AutoModelForSequenceClassification.from_pretrained( "livekit/turn-detector") # Get prediction with torch.no_grad(): outputs = model(**inputs) probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1) print("probabilities", probabilities) # Use index 1 for the positive class probability eou_probability = probabilities[0, 1].item() print(f"End of utterance probability: {eou_probability}")
It outputs something like this:
probabilities tensor([[0.9695, 0.0305]]) End of utterance probability: 0.030476752668619156
I imagine the first value in the tensor is the probability the speech will continue and the second value is the probability the speech is finished.
However, I found that after using this code, for the same input each time the probability result varies, even from 0 to 1.