bartowski/Mistral-Small-Instruct-2409-GGUF · Possibly the provided prompt format is wrong.

Sep 17, 2024

•

edited Sep 18, 2024

Hi!
Thanks for the very quick quants. This model is really great, however apparently there is a big misunderstanding around the new Mistral prompt format. (Also it is differ from the official Mistral description as well)

Here is my reddit post about it:

https://www.reddit.com/r/LocalLLaMA/comments/1fjb4i5/mistralsmallinstruct2409_is_actually_really/

Marinara also confirmed my theory a few weeks ago. (You can find it in the model description)
https://huggingface.co./MarinaraSpaghetti/NemoMix-Unleashed-12B-GGUF

The correct one should be:

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

Another source:
https://community.aws/content/2dFNOnLVQRhyrOrMsloofnW0ckZ/how-to-prompt-mistral-ai-models-and-why

I tested it with your and our version as well. Nemo and this model is way more coherent and "clever" with the suggested format.
With yours it was broken in many of my tests. (More details in the reddit post).

asdfsdfssddf

Sep 17, 2024

•

edited Sep 17, 2024

I can confirm this with the older mistral nemo based models (still d/l'ing this one, presumably it will be the same).

ddh0

Sep 18, 2024

God, I wish Mistral used a better prompt format

bartowski

Owner Sep 18, 2024

•

edited Sep 18, 2024

I just throw what the actual tokenizer chat template compiles to, hence it <s> at the start, and I assume the Jinja will handle the rest properly, which it looks like it will?

I can't speak to whether the system prompt should get its own response, that feels like just multi turn prompting and suggests that a system message just isn't supported

Otherwise I see no difference in the chat template provided vs the one in the AWS link

Suparious

Sep 18, 2024

•

edited Sep 18, 2024

God, I wish Mistral used a better prompt format

You don't need a better prompt format, if you just use the model's original tokenizer.
Not sure how GGUF people handle this issue, but I was able to make a quick python using the transformer's library to instantiate the toeknizer from here:

and if you want to use the v3 tokenizer, you can use the same JSON, but instead, with this model:

https://huggingface.co./mistralai/Mistral-Small-Instruct-2409/blob/main/tokenizer.model.v3

That will allow you to never care about the prompt format.

Also, using good inference engines, you can usually have both a completions endpoint (no tokenizer, needs you to define prompt format) and the chat/completions endpoints (which is using the tokenizer, and does not need you to specify the prompt format.)

TouchNight

Sep 18, 2024

•

edited Sep 18, 2024

Made a prompt Jinja2 template here to support un - user/assistant/user/assistant... sequence by glue continues role's messages together.

{{- '<s>' }}
{%- for message in messages %}
    {%- set prev_message = messages[loop.index0 - 1] if not loop.first else None %}
    {%- set next_message = messages[loop.index] if not loop.last else None %}

    {%- if message['role'] != 'assistant' %}
        {%- if not prev_message or prev_message['role'] == 'assistant' %}
            {{- '[INST] ' }}
        {%- endif %}
        {{- message['content'] }}
        {%- if not next_message or next_message['role'] == 'assistant' %}
            {{- '[/INST]' }}
        {%- elif message['role'] == 'system' %}
            {{- '\n\n' }}
        {%- else %}
            {{- '\n' }}
        {%- endif %}
        
    {%- elif message['role'] == 'assistant' %}
        {%- if loop.first %}
            {{- '[INST] [/INST]' }}
        {%- endif %}
        {{- ' ' + message['content'] }}
        {%- if next_message and next_message['role'] != 'assistant' %}
            {{- '</s>' }}
        {%- else %}
            {{- '</s>[INST] [/INST]' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}

pandora-s

Sep 18, 2024

@vevi33
Hi there! Actually, the v3 should look more like:
<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]
For more deep explanations: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md

vevi33

Sep 18, 2024

@pandora-s
Thank you for the clarification!

I purposed basically this if I am not wrong, but I corrected my post according to your link, the be exactly the same and to not confuse anyone!
Thanks for everyone for being helpful and make this topic finally clear in the community!

mirek190

Sep 18, 2024

•

edited Sep 18, 2024

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

For llamacpp prompt template will be like that

--in-prefix "</s>[INST] " --in-suffix "[/INST] " -p "<s>[INST] You are a helpful assistant.[/INST]"

ddh0

Sep 18, 2024

Hi there! Actually, the v3 should look more like:
<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]
For more deep explanations: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md

@pandora-s Just to clarify: what you've written here is the format one should use for Mistral-Small-Instruct-2409, right?

Danioken

Sep 18, 2024

Hi!
Thanks for the very quick quants. This model is really great, however apparently there is a big misunderstanding around the new Mistral prompt format. (Also it is differ from the official Mistral description as well)

Here is my reddit post about it:

https://www.reddit.com/r/LocalLLaMA/comments/1fjb4i5/mistralsmallinstruct2409_is_actually_really/

Marinara also confirmed my theory a few weeks ago. (You can find it in the model description)
https://huggingface.co./MarinaraSpaghetti/NemoMix-Unleashed-12B-GGUF

The correct one should be:

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

Another source:
https://community.aws/content/2dFNOnLVQRhyrOrMsloofnW0ckZ/how-to-prompt-mistral-ai-models-and-why

I tested it with your and our version as well. Nemo and this model is way more coherent and "clever" with the suggested format.
With yours it was broken in many of my tests. (More details in the reddit post).

I used https://huggingface.co./MarinaraSpaghetti/SillyTavern-Settings

Awesome! Thanks! It really does contribute a lot... in everything, logic, prose, immersion... incredible.

asdfsdfssddf

Sep 18, 2024

I'm using Marinara's presets too and they make a world of difference far as rp is concerned with Mistral models.

pandora-s

Sep 18, 2024

•

edited Sep 18, 2024

Just to clarify: what you've written here is the format one should use for Mistral-Small-Instruct-2409, right?

@ddh0 yes, the original Small repo was fixed a few hours ago with the correct template, sorry for the trouble!