How to infere in real time
#23
by
thunder-007
- opened
Is there some way I can yield the text out from the model in real time like real llms does.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
thunder-007
changed discussion title from
How to get answers in real time
to How to infere in real time
how do solve the following problem: ValueError: Tokenizer class GemmaTokenizer does not exist or is not currently imported.
@Jamison98 for me it helped to update both transformers (as @alxdk mentioned) AND update torch : pip install "torch>=2.1.1" -U
Is there some way I can yield the text out from the model in real time like real llms does.
@thunder-007
: you can use TextStreamer
to see the output while tokens' generation is ongoing.
Example:
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
streamer = TextStreamer(tokenizer)
...
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=500, streamer=streamer)
print(tokenizer.decode(outputs[0]))
Happy inferencing!
Reference: https://huggingface.co./docs/transformers/generation_strategies#streaming
osanseviero
changed discussion status to
closed