GGUF version for model

#1
by samikr - opened

I tried using this model today with llama.cpp, however received a version error.

./build/bin/main -m ./models/3B/orca-mini-3b.q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
Log start
main: build = 1575 (64e64aa)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: seed  = 1701267953
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060 Laptop GPU, compute capability 8.6
gguf_init_from_file: GGUFv1 is no longer supported. please use a more up-to-date version
error loading model: llama_model_loader: failed to load model from ./models/3B/orca-mini-3b.q4_0.gguf

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/3B/orca-mini-3b.q4_0.gguf'
main: error: unable to load model

Then I tried updating the model format to GGUF v3, however still got the same error.

./build/bin/quantize ./models/3B/orca-mini-3b.q4_0.gguf ./models/3B/orca-mini-3b.q4_0-v2.gguf COPY
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060 Laptop GPU, compute capability 8.6
main: build = 1575 (64e64aa)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: quantizing './models/3B/orca-mini-3b.q4_0.gguf' to './models/3B/orca-mini-3b.q4_0-v2.gguf' as COPY
gguf_init_from_file: GGUFv1 is no longer supported. please use a more up-to-date version
llama_model_quantize: failed to quantize: llama_model_loader: failed to load model from ./models/3B/orca-mini-3b.q4_0.gguf

main: failed to quantize model from './models/3B/orca-mini-3b.q4_0.gguf'

Do you rem what GGUF format version you are using? I am using the latest master of llama.cpp (updated today).

Thanks!

Sign up or log in to comment