llama.cpp support
Hi, ayyylol.
When I executed convert_hf_to_gguf.py, I faced the following warnings.
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:** There are 2 possible reasons for this:
WARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref: https://github.com/ggerganov/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh: 4e2b24cc4770243d65a2c9ec19770a72f08cffc161adbb73fcbb6b7dd45a0aae
WARNING:hf-to-gguf:**************************************************************************************
So, I added the following codes in convert_hf_to_gguf.py according to the warnings and the gguf file was created successfully.
diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py
index 550dd5cf..21ff2ebd 100755
--- a/convert_hf_to_gguf.py
+++ b/convert_hf_to_gguf.py
@@ -590,6 +590,9 @@ class Model:
if chkhsh == "855059429035d75a914d1eda9f10a876752e281a054a7a3d421ef0533e5b6249":
# ref: https://huggingface.co./HuggingFaceTB/SmolLM-135M
res = "smollm"
+ if chkhsh == "4e2b24cc4770243d65a2c9ec19770a72f08cffc161adbb73fcbb6b7dd45a0aae":
+ # ref: https://huggingface.co./openai-community/gpt2
+ res = "gpt-2"
if res is None:
logger.warning("\n")
To make llama.cpp support the chat template of EXAONE-3.0-7.8B-Instruct, I added the following codes in src/llama.cpp .
index aaf8db49..19378aef 100644
--- a/src/llama.cpp
+++ b/src/llama.cpp
@@ -19009,6 +19009,21 @@ static int32_t llama_chat_apply_template_internal(
if (add_ass) {
ss << "Assistant:";
}
+ } else if (tmpl == "exaone3" || (tmpl_contains("[|system|]") && tmpl_contains("[|assistant|]") && tmpl_contains("[|endofturn|]"))) {
+ // EXAONE-3.0-7.8B-Instruct
+ for (auto message : chat) {
+ std::string role(message->role);
+ if (role == "system") {
+ ss << "[|system|]" << trim(message->content) << "[|endofturn|]\n";
+ } else if (role == "user") {
+ ss << "[|user|]" << trim(message->content) << "\n";
+ } else if (role == "assistant") {
+ ss << "[|assistant|]" << trim(message->content) << "[|endofturn|]\n";
+ }
+ }
+ if (add_ass) {
+ ss << "[|assistant|]";
+ }
} else {
// template not supported
return -1;
I ran the llama-cli as follows:
llama-cli -m <PATH_TO_GGUF> -p "You are EXAONE model from LG AI Research, a helpful assistant." --chat-template exaone3 -cnv
Good luck to you.
I've just opened a PR that introduces support for ExaoneForCausalLM :)
https://github.com/ggerganov/llama.cpp/pull/9025