Hello, I have some issues with my CMD Ollama while installing this model

#3
by NCGWRjason - opened

Hello,

I followed the official CMD installation instructions to install your Llama-3.2-1B-Instruct-GGUF model from Hugging Face using the following command:
https://huggingface.co./docs/hub/ollama

CMD query
ollama run hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF

I noticed that the Llama-3.2-1B-Instruct-GGUF repository contains multiple GGUF files.

However, after downloading, I found only one model file, specifically:
hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:latest 807MB

Could you please clarify which specific GGUF file this corresponds to and why only one file was downloaded?

Thank you.

deleted
edited Nov 21

Not that it helps you directly but if i dont create it myself, i would normally just download the GGUF i want manually from HF to my machine. Then I import that into whatever tool i want to use, depending on its needs ( openwebui, oobas text gen, llamafile, my own code, whatever ). Doing it that way always works for me and i have a backup of the file in the process i can store.

I don't normally try using the 'tools utilities' to download for me

Could you please clarify which specific GGUF file this corresponds to and why only one file was downloaded?

RTFM?
From https://huggingface.co./docs/hub/ollama that you linked:

Custom Quantization

By default, the Q4_K_M quantization scheme is used, when it’s present inside the model repo. If not, we default to picking one reasonable quant type present inside the repo.

To select a different scheme, simply:

  1. From Files and versions tab on a model page, open GGUF viewer on a particular GGUF file.
  2. Choose ollama from Use this model dropdown.

The snippet would be in format (quantization tag added):

ollama run hf.co/{username}/{repository}:{quantization}

this is clear, which part you don't understand?

However, after downloading, I found only one model file, specifically:
hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:latest 807MB

you downloaded Q4_K_M.

Could you please clarify which specific GGUF file this corresponds to and why only one file was downloaded?

RTFM?
From https://huggingface.co./docs/hub/ollama that you linked:

Custom Quantization

By default, the Q4_K_M quantization scheme is used, when it’s present inside the model repo. If not, we default to picking one reasonable quant type present inside the repo.

To select a different scheme, simply:

  1. From Files and versions tab on a model page, open GGUF viewer on a particular GGUF file.
  2. Choose ollama from Use this model dropdown.

The snippet would be in format (quantization tag added):

ollama run hf.co/{username}/{repository}:{quantization}

this is clear, which part you don't understand?

However, after downloading, I found only one model file, specifically:
hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:latest 807MB

you downloaded Q4_K_M.

I means I use the CMD terminal query,
ollama run hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF

The /bartowski/Llama-3.2-1B-Instruct-GGUF repository contains multiple GGUF files. (https://huggingface.co./bartowski/Llama-3.2-1B-Instruct-GGUF/tree/main)

However, after downloading, I found only one model file in the ollama list, specifically:
hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:latest 807MB

So, I specifically ran bartowski/Llama-3.2-1B-Instruct-GGUF, but it only downloaded Q4_K_M.gguf.
Is this the default model being downloaded? Or is it because this was the most recently uploaded version?
That’s why it appears as Llama-3.2-1B-Instruct-GGUF:latest in the ollama list.

Thank you

Is there a reason you want to download all the sizes..?

Q4_K_M is just the default that ollama uses

Sign up or log in to comment