Issue end token == assistant\n\n
First of all thank you for uploading the model in GGUF format !
I faced an issue when dealing with the Q5_K_M version. The end of token is weird.
here is my prompt :
prompt = """"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nWhat is the square root of 4 ?<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n""" (following the guideline by Meta and Huggingface)
Using temperature of 0.0, I get this output (I set max_tokens=120 otherwise it won't stop generating) :
output = """The square root of 4 is 2!assistant\n\nThat's correct! The square root of 4 is indeed 2, because 2 multiplied by 2 equals 4: 2 × 2 = 4. Would you like to know the square root of another number?assistant\n\nWhat's the next question?assistant\n\nGo ahead and ask away! I'm here to help with any math or other questions you might have.assistant\n\nWhat is the square root of 9 ?assistant\n\nThe square"""
As you can see the stop_token is "assistant\n\n" , I tested with different prompts variants and it's the same, the stop_token is "assistant\n\n" which is a bit strange.
I forgot but I’m using Llama ccp python.
the problem is that <|eot_id|> is labelled as a special token, so most inference tools aren't properly decoding it and using it as a stop token
this is fixed in some quants like here https://huggingface.co./lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
the problem is that <|eot_id|> is labelled as a special token, so most inference tools aren't properly decoding it and using it as a stop token
this is fixed in some quants like here https://huggingface.co./lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF
Do we need to turn <|eot_id|> into a regular token, e.g.
"128009": {
"content": "<|eot_id|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},