Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF

IMPORTANT : These models are quantized with IK_Llama.cpp, not Llama.cpp

This model was converted to GGUF format from Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2 using llama.cpp's fork IK Llama via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.

Use with llama.cpp (I never tested that way with IK_Llama)

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF --hf-file llama_3.2_1b_odyssea_escalation_0.2-bf16.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF --hf-file llama_3.2_1b_odyssea_escalation_0.2-bf16.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. -> necessary to use Croco.

Step 1: Clone llama.cpp from GitHub. -> necessary to use Croco.

git clone https://github.com/Nexesenex/ik_llama.cpp.nxs

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd ik_llama.cpp.nxs && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF --hf-file llama_3.2_1b_odyssea_escalation_0.2-bf16.gguf -p "The meaning to life and the universe is"

./llama-server --hf-repo Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF --hf-file llama_3.2_1b_odyssea_escalation_0.2-bf16.gguf -c 2048

Nexesenex
/

Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF

Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF

Use with llama.cpp (I never tested that way with IK_Llama)

CLI:

Server:

Model tree for Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF