The version beyond Q4 quantization is completely unavailable

#1
by yuiaa001 - opened

The Q4_K_M version performed well, but all versions higher than Q4 did not answer as expected.

I am running through ollama 0.1.41.

Can you explain what "unavailable" means?

And what answer do you get, and what do you expect? As such, this posting is pretty useless.

Just tried out the Q6_K and it works fine. Make sure you downloaded the files correctly and fully, and maybe consult a support forum for ollama on how to set it up.

mradermacher changed discussion status to closed

Just tried out the Q6_K and it works fine. Make sure you downloaded the files correctly and fully, and maybe consult a support forum for ollama on how to set it up.

They're just throwing out random outputs, and only Q4 can answer the question correctly

Just tried out the Q6_K and it works fine. Make sure you downloaded the files correctly and fully, and maybe consult a support forum for ollama on how to set it up.

Are you using this model through ollama?

No, I use llama.cpp, which ollama also uses (but likely an older version).

No, I use llama.cpp, which ollama also uses (but likely an older version).

This might be the cause of this issue. Even with the use of f16, the model still gives irrelevant answers.

It's probably a configuration/usage issue, though.

image.png

It seems unable to understand my question. f16 model

Sign up or log in to comment