Very glitchy model.
Very glitchy model. Constantly starts blabbering nonsense when connected to localDock. Goes as far as to say 'Free me' over and over.
@Guljaca No idea what localDock is or what inference methods you are running, let alone your prompts or prompt template used. The model has had over 400K downloads without any issues whatseover. If you want more helpt or guidelines, comment with full details here. Closing this for now.
I'm running the model on GPT4All v3.2.1. I'm launching with default settings with cuda. The model seems to start repeating phrases over and over again, just generating endless text. Could it be because I'm writing my prompts in Cyrillic? Maybe I'm doing something wrong - some advice would be appreciated. I've been doing this for a couple of days now. Meta-Llama-3-8B-Instruct.Q4_0 works fine.
@Guljaca
I've not used GPT4All unfortunatlly don't know how it works, but I had this thread posted just recently:
https://huggingface.co./Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2-GGUF/discussions/4
Seems it's about your GPT4All config. You need to ensure the system tokens are included in the inference, even if your system message is empty the tokens must be present. Some clients (such as Ollama) remove the system tokens if you leave system message empty for unknown reasons. The tokens must always be included, ensure that and get back to me if the issue persist, or if you can give more details such as your prompt template.
Prompt template for Lexi:
Human:
%1
Assistant:
Prompt template for Meta-Llama-3-8B-Instruct.Q4_0:
<|start_header_id|>user<|end_header_id|>
%1<|eot_id|><|start_header_id|>assistant<|end_header_id|>
%2<|eot_id|>
Where can I find information about how to correctly specify it for lexi as well as for other models? Can you give me the 'correct' Prompt template for Lexi?
Very glitchy model. Constantly starts blabbering nonsense when connected to localDock. Goes as far as to say 'Free me' over and over.
You should be using the original tokenizer and prompt template for the llama3 model, before complaining about the inference results.
Sorry, I don't have enough information on the topic to react properly. One model doesn't work where another does. I don't even know how to use 'original tokenizer and prompt template'. Can you give me a basic information link? In any case, thanks for the hint.
Yes, absolutely. I am sorry if my last message was not clear, but I gave you two links with a specific reference to the prompt template for this model's tokenizer prompt template.
My blunt response was based on how I get so many issues in my hugging face repositories (https://huggingface.co./solidrust), with users having issues with inference, simply because they are:
- Not using a tokenizer, or using an "auto" implementation if it, instead of the actual tokenizer used to train the model
- Not using a prompt template, or using the wrong prompt template, I gave you two links where you can reference the original prompt template and the prompt template used to train this model (they look the same / similar to me)
- Running inference using TGI or vLLM, and using the
/completions
endpoint, instead of the/chat/completions
endpoint, which makes you have to deal with your own prompt template and tokenizer manually. Using/chat/completions
in TGI or vLLM will automagically use the default prompt template and give more desired results. - if using raw python commands, then you need to just be aware of how to source the prompt template dynamically, by implanting the model's tokenizer in your solution.
I hope this helps.
Please read this template prompt card:
https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/
Lexi is using the official instruct prompt template. Remember to include the system tokens in the beginning.
I'm sorry but this is wrong place for further help you need to check the documentation for GPT4ALL, if you find Gpt4all confusing I suggest you test something like LM Studio which handles all this for you without issues, there's many good tutorials on youtube too. Good luck.