Please consider changing the instruct format!

#10
by MarinaraSpaghetti - opened

Dear MistralAI Team! Just to be completely transparent: I absolutely love your models and your work β€” you guys are the best! However, I think I'm speaking on behalf of the entire community when I say: please, PLEASE consider ditching the gods-awful Mistral Instruct format!

It's abysmal to work with, especially for the folks who use Text Completion API to run the models (which is what Oobabooga's WebUI or Koboldcpp are offering). It's simply confusing to set up and very rigid, especially if someone is working with dynamic context insertions (lore book entries added at different depths, for example). People are also unsure how the correct format for each new model looks like β€” even legends like Bartowski himself got it mixed up. Hell, I myself had to double-check the NeMo one with one of your team members, only to learn that I've also been using it incorrectly!

Not to mention, the lack of proper system prompt is actively harming the model's possible capabilities and its reasoning, especially on higher contexts. We need a clear distinction between different roles: system, user, and assistant. The format should recognize the system prompt as the main instruction to be followed at all times β€” it's how most, if not all, other formats handle this. Take ChatML, for example. Because it just works.

Even adding a simple [SYSTEM]/[/SYSTEM] tag would help IMMENSELY. Or leaving [INST]/[/INST] for the system, while adding [USER]/[/USER] for the user. Here is my ideal example, using Mistral Small's format:

<s>
[SYSTEM]
{{System Prompt}}
[/SYSTEM]
[INST]
{{User's Message}}
[/INST]
{{Assistant's Response}}
</s>

Please, I beg you on my knees β€” with tears, snorts, and all, leaking down my face β€” at least consider this possibility. You will make our lives easier and your models even superior to what they already are. Thank you.

Mistral AI_ org
β€’
edited 2 days ago

Hi @MarinaraSpaghetti ,
We have actually wrote a document in the cookbook repo delving into this, it explains the slight difference between each tokenizer and also what should be used as ground truth in details: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md , We will increase the amount of documentation regarding the tokenizers and chat templates to help out! Hopefully this document will answer most of your doupts!

Hi @MarinaraSpaghetti ,
We have actually wrote a document in the cookbook repo delving into this, it explains the slight difference between each tokenizer and also what should be used as ground truth in details: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md , We will increase the amount of documentation regarding the tokenizers and chat templates to help out! Hopefully this document will answer most of your doupts!

Hey @pandora-s , thank you, the document is super useful! I've been there when it was being conceived, and I'm still super grateful to you for creating it and sending it over on Discord. :)

But sadly, this doesn't answer the main concern I've raised β€” the lack of proper system/user/assistant distinction in the Mistral format itself. Due to that, the model forgets its initial instructions and doesn't work as good as intended in multi-turn conversations. Especially on higher contexts, like I previously mentioned. I know you can not confirm whether you'll be changing the format or not in the future. All I'm asking for is to at least consider my request and re-evaluate how the format could potentially look like, taking into consideration what other successful models are using. It would be amazing, truly.

Once again, thank you for all the hard work and for the incredible job you're doing!

Hi @MarinaraSpaghetti ,
We have actually wrote a document in the cookbook repo delving into this, it explains the slight difference between each tokenizer and also what should be used as ground truth in details: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md , We will increase the amount of documentation regarding the tokenizers and chat templates to help out! Hopefully this document will answer most of your doupts!

Wow, thanks for the boilerplate response! /s

Here's the problem with it, it's not standard, and I don't see a reason for it to have to use its own, why not use ChatML like @MarinaraSpaghetti suggested? ChatML is at least recognizable by most frontends, and the instruct format itself doesn't make sense.

[INST] {{User's Message}} [/INST] {{Assistant's Response}} </s>

okay? Why do you need to use INST twice? It doesn't denote which is which outside one being {{user}} and the other being {{assistant}}, one little screw up and you've confused yourself thinking it's broken. Said cookbook doesn't explain any advantage or use other than the prompt itself. I've noticed a similar problem via Mistral's API, where this prompt would ruin interactions because it itself doesn't know who is who.

This isn't me trying to hate said models or your business, it is simply my own opinion on the falters of this format, it isn't personal.

I completely agree that the current format of mistral is very inflexible. Moreover, when something like two answers in a row from the user or the need for two or more answers from the assistant happens, then you need to really erupt so that llm understands everything correctly, and this will be a very cumbersome construction of prefixes and affixes, or create some kind of separate layer that will reformat it for the mistral format. This is all incredibly inconvenient, not intuitive.
I really love mistral models, but it is also so painful to use the current format of mistral instructions.

Mistral AI_ org
β€’
edited 1 day ago

where this prompt would ruin interactions because it itself doesn't know who is who.

@ProdeusUnity This sounds like a misunderstanding, as explained in the document, the strings are only representations, the model never sees the string, it sees an ID token directly that is not correlated to any string. Its the same as with the BOS and EOS tokens, they are not actually <s> nor </s>, these are simply representations. It doesnt matter if its [INST] or <user> (this would only matter for the tokenizer v1), the model never sees the strings since they became control tokens instead, they only see an ID specially dedicated to it, this also avoids any possibility of prompt injections making it extra safe!

We could in theory change the documentation and rename the control token to <user> without touching the model and having it keep the exact same behavior as normal, the strings equivalent of control tokens are simply representations for the users and developers.

Could you share more about this issue you mention? Couldnt it be an issue possibly with the System prompts like mentionned previously by @MarinaraSpaghetti ?

Mistral AI_ org

I completely agree that the current format of mistral is very inflexible. Moreover, when something like two answers in a row from the user or the need for two or more answers from the assistant happens, then you need to really erupt so that llm understands everything correctly, and this will be a very cumbersome construction of prefixes and affixes, or create some kind of separate layer that will reformat it for the mistral format. This is all incredibly inconvenient, not intuitive.
I really love mistral models, but it is also so painful to use the current format of mistral instructions.

Thanks for the feedback!

+1 will echo the sentiment here. I love the mistralai models, but finetuning is always hell because of the formats, edge cases, extremely sensitive, and the whitespaces...etc. It would be amazing if it followed a simple, easy to read, human readable format :)

We could in theory change the documentation and rename the control token to without touching the model and having it keep the exact same behavior as normal, the strings equivalent of control tokens are simply representations for the users and developers.

This would be a good starting point (I didn't know you could do that!). Even developers struggle with all the formats and their quirks/edge cases + all the libraries and their own layers.

Even developers struggle with all the formats and their quirks/edge cases + all the libraries and their own layers.

Hi, developer here.

I'm practically in love with Mistral as a company and the performance of the Mistral models is consistently, delightfully very impressive... on the API.

However I have never quite been able to 100% match the performance I'm seeing on the API with the local GGUFs. Every time I think I have the prompt format finally perfectly figured out, a new model comes out with a very slightly tweaked tokenizer, or there's a new post on r/LocalLLaMA detailing the new, finally-figured-out prompt format, whether to add whitespace or not, a new off-label way to wrangle the model into obeying a system prompt... it's so frustrating because I can see that there is such potential and there's just some little prompt format weirdness holding it back.

If it was only me having this issue I would think I was stupid and that would be the end of it. But it's clearly not.

I would suggest something like this:

[SYSTEM]
Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
[/SYSTEM]
[USER]
hello
[/USER]
[BOT]
Hello! How can I assist you today? Let me know if you have any questions, need some advice, or just want to chat. I'm here to help! 😊
[/BOT]
[USER]
thank you
[/USER]

etc.

I really hope I'm not sounding ungrateful, I truly appreciate the contributions Mistral's made to the community. I just think the current format is really holding the models back.

@pandora-s β€οΈπŸ€—

+1 on one of the best models having, sadly, one of the worst instruct format I had the displeasure of dealing with. It's VERY inflexible, the "add a space before and after, oh, in fact, don't anymore, well it depends on the tokenizer" has been very annoying and wholly unnecessary. The "rolling system prompt" (at the end of the last query) objectively worsen the model's output outside of extremely specific tasks, and I suspect it to be the cause for most fine-tunes to be, on average, pretty brain-dead out of the box. It's also conflicting with most/all methods used to reduce prompt processing computation.

I get that you want to get rid of system prompts altogether (at least that's what I'm guessing, only reason for this esoteric formatting, really), but still.

Sign up or log in to comment