llama.cpp support

#8
by ayyylol - opened

Thank you for this model!

Will you support llama.cpp?

The llamafied versions are not compatible.

yireun changed discussion status to closed
yireun changed discussion status to open
LG AI Research org
edited Aug 13

Hi, ayyylol.

When I executed convert_hf_to_gguf.py, I faced the following warnings.

WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:** There are 2 possible reasons for this:
WARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref: https://github.com/ggerganov/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh: 4e2b24cc4770243d65a2c9ec19770a72f08cffc161adbb73fcbb6b7dd45a0aae
WARNING:hf-to-gguf:**************************************************************************************

So, I added the following codes in convert_hf_to_gguf.py according to the warnings and the gguf file was created successfully.


diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py
index 550dd5cf..21ff2ebd 100755
--- a/convert_hf_to_gguf.py
+++ b/convert_hf_to_gguf.py
@@ -590,6 +590,9 @@ class Model:
         if chkhsh == "855059429035d75a914d1eda9f10a876752e281a054a7a3d421ef0533e5b6249":
             # ref: https://huggingface.co./HuggingFaceTB/SmolLM-135M
             res = "smollm"
+        if chkhsh == "4e2b24cc4770243d65a2c9ec19770a72f08cffc161adbb73fcbb6b7dd45a0aae":
+            # ref: https://huggingface.co./openai-community/gpt2
+            res = "gpt-2"

         if res is None:
             logger.warning("\n")

To make llama.cpp support the chat template of EXAONE-3.0-7.8B-Instruct, I added the following codes in src/llama.cpp .


index aaf8db49..19378aef 100644
--- a/src/llama.cpp
+++ b/src/llama.cpp
@@ -19009,6 +19009,21 @@ static int32_t llama_chat_apply_template_internal(
         if (add_ass) {
             ss << "Assistant:";
         }
+    } else if (tmpl == "exaone3" || (tmpl_contains("[|system|]") && tmpl_contains("[|assistant|]") && tmpl_contains("[|endofturn|]"))) {
+        // EXAONE-3.0-7.8B-Instruct
+        for (auto message : chat) {
+            std::string role(message->role);
+            if (role == "system") {
+                ss << "[|system|]" << trim(message->content) << "[|endofturn|]\n";
+            } else if (role == "user") {
+                ss << "[|user|]" << trim(message->content) << "\n";
+            } else if (role == "assistant") {
+                ss << "[|assistant|]" << trim(message->content) << "[|endofturn|]\n";
+            }
+        }
+        if (add_ass) {
+            ss << "[|assistant|]";
+        }
     } else {
         // template not supported
         return -1;

I ran the llama-cli as follows:

llama-cli -m <PATH_TO_GGUF> -p "You are EXAONE model from LG AI Research, a helpful assistant." --chat-template exaone3 -cnv

Good luck to you.

@yireun thank you very much, that is very kind of you. It works now!

I've just opened a PR that introduces support for ExaoneForCausalLM :)
https://github.com/ggerganov/llama.cpp/pull/9025

@mscheong thank you!

Sign up or log in to comment