Transformers
GGUF
llama

Fix prompt format in llama.cpp command

#2
by nacs - opened

Airoboros uses Vicuna style prompt (not Alpaca)

huh, yeah that's not right. I thought I had that working to put in the right prompt template for GGML/GGUF files. Maybe a regression specific to GGUF, I'll check it. Thanks for the PR

TheBloke changed pull request status to merged

That's not the correct prompt format either.
This https://chat.openai.com/share/8e98059c-723a-460f-88f4-f010ec996925 analysis of the source code of QLORA tells you what the valid prompt formats are.
According to the README here https://huggingface.co./jondurbin/airoboros-c34b-2.1

Prompt format

The training code was updated to randomize newline vs space: https://github.com/jondurbin/qlora/blob/main/qlora.py#L559C1-L559C1

Via experimentation and looking at the source code of QLORA, I arrived at the conclusion that there's a prompt format for airoboros-c34b-2.1 that works for every situation.

Look at my post here that mentions that prompt format several times: https://huggingface.co./TheBloke/Airoboros-c34B-2.1-GGUF/discussions/1#64edee8c9e9c0fc5f6df5b34

Long story short:

"""A chat.
USER: Can you explain the difference between vector and dequeue in C++?
ASSISTANT: """

Notice the single space character after the ASSISTANT:, and notice the period after the A chat.

As you can see in ChatGPT-4's analysis of the source code of QLORA (that I linked above),
there is no valid prompt format that involves a newline character after the ASSISTANT:, it's always a single space after ASSISTANT:.

A weird thing that I can't explain: When providing the prompt without a space after ASSISTANT:, like this:

"""A chat.
USER: Can you explain the difference between vector and dequeue in C++?
ASSISTANT:"""

the model always adds a space (but seems to get a little confused).

Then, when using the (supposedly) correct prompt format:

"""A chat.
USER: Can you explain the difference between vector and dequeue in C++?
ASSISTANT: """

The model always adds 1 additional space to the response, such that we get 2 spaces, like this: ASSISTANT: (2 spaces).
When I use the supposedly correct prompt format, the AI model produces high-quality responses, but it tends to get confused and sometimes include a double newline in its answer.

What's also odd is that in Jon Durbin's README, he doesn't use any space character after ASSISTANT:.

Note also that oobabooga/text-generation-webui has this setting:
Add the bos_token to the beginning of prompts.

I don't know what that setting does, it might be important. I decided to turn it off because it's not part of the prompt format in Airoboros-c34B-2.1, as opposed to llama2-chat models in which I understood that the BOS token is part of the prompt format.

Yeah that's true - there shouldn't be a newline after ASSISTANT:

I've fixed the issue in my GGUF and GGML code that was causing the prompt templates not to merge into the llama.cpp command properly and will roll it out to these repos shortly

Hi @TheBloke I don't quite understand what you mean. I've recently added information to my previous reply, and I mentioned 3 issues altogether:

  1. No newline allowed after ASSISTANT:
  2. Period required after A chat
  3. Weird tendency for the model to add an extra space character after ASSISTANT:

Yes, I was saying that I will be updating the README for the correct prompt template as per the source README.

However Jon has just told me that the C34B model had not been correctly updated in his repo. So the updated model I uploaded today -second upload of this model - was not actually updated at all and was just the same as the first one, which is known to have various bugs.

So I am now doing it again, for the third time. This time it should finally work better - and then maybe the stated prompt template will work OK.

Check back in an hour or so for the newly updated GGUFs, and in ~3 hours for updated GPTQs.

Sign up or log in to comment