Does this use imatrix?

by nonetrix - opened Apr 8, 2024

Apr 8, 2024

•

edited Apr 8, 2024

Seems imatrix can improve quants, does this use it? At 2 bits as expected it sometimes acts a bit strange, but I have seen other quants of similar sized models do better. I haven't tried 3 bits yet though might be able to squeeze it into 64GBs of RAM. Also, I think imatrix needs a dataset, ideally should be in all the languages that this model supports well. But, I imagine that it would greatly help with cases like this even with a lazy dataset

nonetrix

Apr 8, 2024

https://huggingface.co./dranger003/c4ai-command-r-plus-iMat.GGUF

nonetrix changed discussion status to closed Apr 8, 2024

pmysl

Owner Apr 8, 2024

These are regular quants (without imatrix)

Could you tell me which llama.cpp fork you use and what the SHA-256 hash of the weights is? There was an issue with F16 token embeddings, and I would like to make sure that this is not related to it

nonetrix

Apr 8, 2024

•

edited Apr 8, 2024

Well seems to always be periods a lot of the time it struggles with oddly, also seems to always pick similar words it will replace it with. Ignore the broken fonts that's a issue with my terminal with CJK languages not sure what it is

Anyway, as for the sum it's 47e139a57872a72096c05b043b1ec6c2f08451da7df0d84d45168708667b98f5 ./models/command-r-plus-Q2_K.gguf and I am running this https://github.com/ggerganov/llama.cpp/pull/6491 at commit d2924073ee9bdd600d22ded4e2d5fe30e69783a7

nonetrix

Apr 9, 2024

Tried Q3 big improvement and still have more spare memory than I thought I would

pmysl

Owner Apr 9, 2024

I have some early perplexity results on wikitext-2-raw and they seem to confirm this improvement

Test	PPL Value	Standard Deviation
Q2_K	5.7178	+/- 0.03418
Q3_K_L	4.6214	+/- 0.02629
Q4_K_M	4.4625	+/- 0.02522

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment