How can I get results similar to those from Google AI Studio locally?

#14
by nitky - opened

Based on my testing, it only works well with the chat template. The code examples in the README didn't work for me.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-27b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-27b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

messages = [
    {"role": "user", "content": "Write me a poem about Machine Learning."}
]

input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))

However, even with the chat template, the responses are not as good as those from Google AI Studio.
Could you recommend any settings?

Google org
edited Jul 1

Hey @nitky !

In order to get it working best, we recommend:

  • Using bfloat16 (which you already use)
  • Using attn_implementation='eager'
  • Using the latest transformers version.

Can you confirm whether you still get bad results after having these presets?

By applying the options you suggested, I was able to achieve output quality comparable to Google AI Studio.

$ pip install -U transformers
$ pip list
...
transformers                 4.42.3
...

Here is my revised code:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-27b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-27b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    attn_implementation='eager'
)

messages = [
    {"role": "user", "content": "Write me a poem about Machine Learning."}
]

input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))

Using this updated code, I confirmed that it works perfectly during multi-turn conversations.

Thank you very much for your help!

nitky changed discussion status to closed

Sign up or log in to comment