Chat template is broken after tool calling support was added to the template

#61
by romanwozniak - opened

After this MR was merged https://huggingface.co./mistralai/Mistral-Nemo-Instruct-2407/commit/16bd7875cfcddb94d3bd5af433110e9b892b76e4, chat template stopped working for assistant messages.

A sequence of system -> user messages – WORKS
A sequence of a single user message – WORKS
A sequence of user -> assistant -> user messages – DOESN'T WORK
A sequence of system -> user -> assistant -> user messages – DOESN'T WORK

The error message is:

After the optional system message, conversation roles must alternate user/assistant/user/assistant/...

It works for me...

Do you have an example message sequence?

I deployed the model with vllm using openai/v1/chat/completions endpoint to try it out.

Here is the request payload I'm using:

{
    "messages":
    [
        {
            "role": "system",
            "content": "You are a usefull assistant. Reply to the given prompts in a concise manner"
        },
        {
            "role": "user",
            "content": "Tell me a joke"
        },
        {
            "role": "assistant",
            "content": "Why don't bears wear shoes?\n\nBecause they have bear feet!"
        },
        {
            "role": "user",
            "content": "Great, tell another one!"
        }
    ],
    "model": "mistralai-mistral-nemo-instruct-2407",
    "max_tokens": 8096,
    "stream": false,
    "temperature": 0.3
}

Once I manually replaced tokenizer_config.json with the content from the 16bd7875cfcddb94d3bd5af433110e9b892b76e4 revision – it worked fine.

Mistral AI_ org

Hi there! Interesting, the chat template and tokenizer seems to work well on my end with this repository. The following script works without any problem:

from transformers import AutoTokenizer

hf_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-Nemo-Instruct-2407")

messages = [
        {
            "role": "system",
            "content": "You are a usefull assistant. Reply to the given prompts in a concise manner"
        },
        {
            "role": "user",
            "content": "Tell me a joke"
        },
        {
            "role": "assistant",
            "content": "Why don't bears wear shoes?\n\nBecause they have bear feet!"
        },
        {
            "role": "user",
            "content": "Great, tell another one!"
        }
    ]

hf_text =hf_tokenizer.apply_chat_template(messages, tokenize=False)
hf_tokens = hf_tokenizer.apply_chat_template(messages, tokenize=True)

print(hf_text)
print(hf_tokens)

Is there a possibility of the error coming from an non-updated transformers or configuration not up to date?

Thank you for looking into this.

I'm a bit lost at this point as I can reproduce your script working in my environment (i.e. with the same version of transformers library etc.).
At this point, I believe there could be something wrong with the way how either vllm or kserve load tokenizer config kserve/kserve

That's the full stacktrace I'm facing. Will keep looking for the root cause of this issue, and will stick to 16bd7875cfcddb94d3bd5af433110e9b892b76e4 revision for now.

  File "/kserve/kserve/protocol/rest/openai/endpoints.py", line 123, in create_chat_completion
    completion = await self.dataplane.create_chat_completion(
  File "/kserve/kserve/protocol/rest/openai/dataplane.py", line 100, in create_chat_completion
    return await model.create_chat_completion(completion_request)
  File "/kserve/kserve/protocol/rest/openai/openai_chat_adapter_model.py", line 204, in create_chat_completion
    chat_prompt = self.apply_chat_template(params.messages)
  File "/huggingfaceserver/huggingfaceserver/vllm/vllm_model.py", line 63, in apply_chat_template
    prompt=self.openai_serving_completion.apply_chat_template(messages)
  File "/huggingfaceserver/huggingfaceserver/vllm/vllm_completions.py", line 356, in apply_chat_template
    return self.tokenizer.apply_chat_template(
  File "/prod_venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1833, in apply_chat_template
    rendered_chat = compiled_template.render(
  File "/prod_venv/lib/python3.10/site-packages/jinja2/environment.py", line 1304, in render
    self.environment.handle_exception()
  File "/prod_venv/lib/python3.10/site-packages/jinja2/environment.py", line 939, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 14, in top-level template code
  File "/prod_venv/lib/python3.10/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
  File "/prod_venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1914, in raise_exception
    raise TemplateError(message)
jinja2.exceptions.TemplateError: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...

Looks like your problem stems from kserve, it might have pinned an older transformers that sets the template environment in a way that muddles the rendering.

The version of transformers used by kserve (in the pre-released version which I'm currently testing) is 4.43.3 (ref), which is also confirmed by me via pip freeze.

I suspect that somehow, there is such a combination of passed arguments to compiled_template.render(...), that breaks the rendering process of the latest Nemo teplate. By "passed arguments" I mean tools, documents and template_kwargs

Sign up or log in to comment