Prompt format
Interesting, i used the prompt format, that was in the model card before and it worked very well (system prompt: Comment the source.), but i guess, i it can be placed in the main prompt too with the same effect. Thank you for providing the files.
Yeah they don't specify a template but then clearly it's meant to be chatted with, I'll update if I find the proper one but I'm glad to hear the default instruct worked, maybe I'll put it back for now
Confirmed working template:
simple:
<s>[INST] {user_prompt} [/INST] {assistant_response} </s><s>[INST] {new_user_prompt} [/INST]
with system prompt:
<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>
{user_prompt} [/INST] {assistant_response} </s><s>[INST] {new_user_prompt} [/INST]
Hope it helps :)
-p ''
works too in my tests with llama.cpp, but sure it's possible instruct syntax could be better, thanks.
Here's how the official Mistral v3 (codestral and mixtral-8x22b) tokenizer handles a fill-in-middle request:
>>> tokenizer.encode_fim(FIMRequest(prompt='hello', suffix='world'))
Tokenized(tokens=[1, 13, 10239, 11, 7080, 29477], text='<s>[SUFFIX]world[PREFIX]▁hello', prefix_ids=None)
Chat is tokenized as expected:
>>> tokenizer.encode_chat_completion(ChatCompletionRequest(messages=[AssistantMessage(content='one'), UserMessage(content='two')]))
Tokenized(tokens=[1, 3, 4, 1392, 2, 3, 1757, 4], text='<s>[INST][/INST]▁one</s>[INST]▁two[/INST]', prefix_ids=None)
However system messages appear to be attached to the last user message, with only two line feeds:
>>> tokenizer.encode_chat_completion(ChatCompletionRequest(messages=[SystemMessage(content='one'), UserMessage(content='two'), AssistantMessage(content='three'), UserMessage(content='four')]))
Tokenized(tokens=[1, 3, 1757, 4, 2480, 2, 3, 1392, 781, 781, 14939, 4], text='<s>[INST]▁two[/INST]▁three</s>[INST]▁one<0x0A><0x0A>four[/INST]', prefix_ids=None)
I.e. the prompt would be something like:
<s> [INST] {user_prompt} [/INST] {assistant_prompt} </s> [INST] {system_prompt}
{user_prompt} [/INST]
But if you're sending a prompt in text format to llama.cpp I think it will add the <s>
(BOS) token automatically. The spaces around each special token shouldn't actually be there, but I think at least some tokenizers need them to detect that they are in fact special tokens. Might be a good idea to verify that your prompt template tokenizes correctly by hand.
Multiple system messages are all added to the last user message, each followed by two newlines.
It's worth asking them maybe. Might be a bug
The v1 tokenizer (Mistral 7b, Mixtral 8x7b) adds all system messages to the first:
>>> tokenizer1.encode_chat_completion(ChatCompletionRequest(messages=[SystemMessage(content='system1'), UserMessage(content='user1'), AssistantMessage(content='ass1'), UserMessage(content='user2'), SystemMessage(content='system2')]))
Tokenized(tokens=[1, 733, 16289, 28793, 1587, 28740, 13, 13, 6574, 28750, 13, 13, 1838, 28740, 733, 28748, 16289, 28793, 1155, 28740, 2, 733, 16289, 28793, 2188, 28750, 733, 28748, 16289, 28793],
text='<s>▁[INST]▁system1<0x0A><0x0A>system2<0x0A><0x0A>user1▁[/INST]▁ass1</s>▁[INST]▁user2▁[/INST]', prefix_ids=None)
Also note that [INST]
and [/INST]
weren't special tokens in their v1 tokenizer, but they are in v3.
I'm thinking it might be an advantage of having them near the end, as system messages at the top usually seems to have less and less effect the longer the conversation goes on. KV cache shifting algorithms might need to get a little bit more sophisticated to avoid having to reevaluate everything constantly
Noob question I take but how does that prompt template translate into a LM Studio config ?
Is it different from the Mistral Instruct template? https://github.com/lmstudio-ai/configs/blob/main/mistral-instruct.preset.json
{
"name": "Mistral Instruct",
"inference_params": {
"input_prefix": "[INST]",
"input_suffix": "[/INST]",
"antiprompt": [
"[INST]"
],
"pre_prompt_prefix": "",
"pre_prompt_suffix": ""
},
"load_params": {
"rope_freq_scale": 0,
"rope_freq_base": 0
}
}
yeah just use the Mistral Instruct prompt format in LM Studio, it can be used other ways but the [INST] will work nicely for instruction following