Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF
IMPORTANT : These models are quantized with IK_Llama.cpp, not Llama.cpp
This model was converted to GGUF format from Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2
using llama.cpp's fork IK Llama via the ggml.ai's GGUF-my-repo space.
Refer to the original model card for more details on the model.
Use with llama.cpp (I never tested that way with IK_Llama)
Install llama.cpp through brew (works on Mac and Linux)
brew install llama.cpp
Invoke the llama.cpp server or the CLI.
CLI:
llama-cli --hf-repo Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF --hf-file llama_3.2_1b_odyssea_escalation_0.2-bf16.gguf -p "The meaning to life and the universe is"
Server:
llama-server --hf-repo Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF --hf-file llama_3.2_1b_odyssea_escalation_0.2-bf16.gguf -c 2048
Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. -> necessary to use Croco.
Step 1: Clone llama.cpp from GitHub. -> necessary to use Croco.
git clone https://github.com/Nexesenex/ik_llama.cpp.nxs
Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1
flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
cd ik_llama.cpp.nxs && LLAMA_CURL=1 make
Step 3: Run inference through the main binary.
./llama-cli --hf-repo Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF --hf-file llama_3.2_1b_odyssea_escalation_0.2-bf16.gguf -p "The meaning to life and the universe is"
or
./llama-server --hf-repo Nexesenex/Llama_3.2_1b_Odyssea_Escalation_0.2-GGUF --hf-file llama_3.2_1b_odyssea_escalation_0.2-bf16.gguf -c 2048
- Downloads last month
- 50