Transformers
GGUF
9 languages
chat
Inference Endpoints

iQ3 XS broken?

#2
by c3real2k - opened

Hi mradermacher,

first of all, thank you so much for all the quants you provide!

I have a problem with the iQ3 XS quant in this repository. The model returns nonsense. Is this because this repository should only include static quants while IQ3 implies imatrix quants?

I downloaded and checked the iQ3 XXS version from your other magnum 123b repository (the i1 gguf one), that version works no problem (same hardware, software, sampler, templates, prompts.). It's not that the model puts out gibberish, it's just... strange. Is it that it's not following the instructions at all? Gives either short, denying answers, or repeats the last instruction. I'm not knowledgable enough to know what's happening here. First I thought it might be the prompt/template. But as I said, same settings work with the IQ3 XXS quant. Tokenizer? I don't know... Thought I'd let you know.

The answer to my default question was rather amusing: I mean, yeah, asking for bash scripts is... almost depraved! Good thing that the model drew the line there ;-)

Bildschirmfoto 2024-08-20 um 18.22.36.png

I-quants are NOT imatrix quants. They have nothing to do with each other. I don't understand why everyone keeps confusing I-Quants for imatrix quants. I-quants trade off additionally computation during inference for less memory while imatrix quants apply an importance matrix to give more bits to what is important. While those techniques can be combined you can booth have I-quants without an imatrix and imatrix quants without using I-Quants.

What you get as output is not gibberish- Gibberish would be random tokens like the following:

isray02loageicui1ianasticistchegeelseaukschysastcrosszanitsanswnN63547693767676767760ennaremenHursonHuhahah-uBar-SHELLURZITzah-H SLAM05767676776776

What you are getting is an actual valid response from this model based on how it was trained. If you are not satisfied with it for your use-case I recommend trying a different model. It seems quite unlikely that your issue is related to quantization and is instead a general issue of this specific model. Just to make sure I recommend upgrading Text Generation web UI to the latest version in case there was some issue with llama.cpp.

Oh you are using chat-instruct. Can you please try if you get the same issue using chat mode as for chat-instruct things like the system prompt and chat template play an important role?

I-quants are NOT imatrix quants. They have nothing to do with each other. I don't understand why everyone keeps confusing I-Quants for imatrix quants. I-quants trade off additionally computation during inference for less memory while imatrix quants apply an importance matrix to give more bits to what is important. While those techniques can be combined you can booth have I-quants without an imatrix and imatrix quants without using I-Quants.

I see. Thanks for the clarification.

What you get as output is not gibberish- Gibberish would be random tokens like the following:

That's why I wrote it doesn't put out gibberish.

Can you please try if you get the same issue using chat mode

Yes, tried that just now. i1 IQ3 XXS gave a fitting answer, describing the basics of bash scripting.
"Sure, I'd be happy to give you an introduction to bash scripting. Here's an overview of some key concepts: [...]"

IQ3 XS returned this:
Bildschirmfoto 2024-08-21 um 14.42.22.png
(or something similar like "What're your thoughts on this?", "What if you ask me, "What kind of assistant?"", ...)

Same problem occurs when interfacing ooba via Silly Tavern (using the context and instruct templates provided by anthracite).

What's the SHA256 hash of your concatenated IQ3_XS model? I just want to make sure nothing got corrupted during download or concatenation. I'm currently downloading it myself.

What's the SHA256 hash of your concatenated IQ3_XS model? I just want to make sure nothing got corrupted during download or concatenation. I'm currently downloading it myself.

(man I need faster storage...)
583af9978638d4ec0c5422720530afca92f5cd355142f4e5acbc345e669ef2b9 magnum-v2-123b.IQ3_XS.gguf

I concatenated with 'cat'. Did that for many other models before, had no problems so far. But yes, maybe a corrupted download? I'll download again as well...

Thanks for looking into ;-)

wget https://huggingface.co./mradermacher/magnum-v2-123b-GGUF/resolve/main/magnum-v2-123b.IQ3_XS.gguf.part1of2
magnum-v2-123b.IQ3_XS.gguf.part1of2 100%[=================================================================>] 24,00G 26,7MB/s in 13m 36s
wget https://huggingface.co./mradermacher/magnum-v2-123b-GGUF/resolve/main/magnum-v2-123b.IQ3_XS.gguf.part2of2
magnum-v2-123b.IQ3_XS.gguf.part2of2 100%[=================================================================>] 22,70G 30,9MB/s in 12m 53s

cat magnum-v2-123b.IQ3_XS.gguf.part1of2 magnum-v2-123b.IQ3_XS.gguf.part2of2 > magnum-v2-123b.IQ3_XS.gguf_redownload
sha256sum magnum-v2-123b.IQ3_XS.gguf_redownload ~/AI/oobabooga/text-generation-webui/models(main✗)@hestialinux
583af9978638d4ec0c5422720530afca92f5cd355142f4e5acbc345e669ef2b9 magnum-v2-123b.IQ3_XS.gguf_redownload

So, the download doesn't seem to be the problem. I also updated Oooba, still no luck. Settings for loading the model are the following:
Bildschirmfoto 2024-08-21 um 15.48.38.png

It's not crucial to get this quant running, as I can just use the imatrix iq3 xxs one. I just thought I'd post it here, if someone else runs into similar problems.

// Edit: Lowering the context and disabling context quantization didn't help either. Got a weird tutorial on how to train LLMs though :D

You have to train a large language model to generate text from a dataset. Here is how to preprocess text for the AI:

  1. Remove any newline characters from the text
  2. Convert all text to lowercase
  3. Convert all text to titlecase
  4. Add a period at the beginning of the text
  5. Add a period at the end of the text
  6. Remove all newline characters from the text
  7. Remove all punctuation from the text
  8. Remove any alphabetic characters except for periods from the text
  9. Add a period before and after each sentence in the text
    1preprocessing) Tokenize the text by splitting on whitespace and punctuation
    2preprocessing) Tokenize the text by splitting on punctuation
    3preprocessing) Tokenize the text by splitting on whitespace and alphanumeric characters
    4preprocessing) Tokenize the text by splitting on alphanumeric characters
    5preprocessing) Tokenize the text by splitting on periods
    6preprocessing) Tokenize the text by splitting on newline characters
    I have a dataset of questions and answers. I train a large language model to classify the text.

583af9978638d4ec0c5422720530afca92f5cd355142f4e5acbc345e669ef2b9 magnum-v2-123b.IQ3_XS.gguf

I can confirm my download results in the same hash:
583af9978638d4ec0c5422720530afca92f5cd355142f4e5acbc345e669ef2b9 magnum-v2-123b.IQ3_XS.gguf

I concatenated with 'cat'. Did that for many other models before, had no problems so far. But yes, maybe a corrupted download? I'll download again as well...

Using cat exactly how you are supposed to do it.

I just tested it using llama.cpp with ./llama-cli -m /bpool/magnum-v2-123b.IQ3_XS.gguf -p "I believe the meaning of life is" -n 128 and something strange is indeed going on. It like constantly repeats itself in completion mode but this could also be a quirk of the base model Here some examples:

I believe the meaning of life is to learn from the mistakes of our parents and from our own experiences. We learn to forgive, to forgive ourselves, to be humble, and to love others. I think we should all be kind and generous. We need to teach our children that we are all created equal. I think we should all be kind and generous. We need to teach our children that we are all created equal. I think we should all be kind and generous. We need to teach our children that we are all created equal. I think we should all be kind and generous. We need to teach our children that we are all created equal. I think we
I believe the meaning of life is to be in the world, not of it; and I believe that what we have to do is to be in the world, and that what we have to do is to be in the world. And I believe that what we have to do is to be in the world. And I believe that what we have to do is to be in the world, and that what we have to do is to be in the world. And I believe that what we have to do is to be in the world, and that what we have to do is to be in the world. And I believe that what we have to do is to be
I believe the meaning of life is to be found in the things we love and in the things we don't love. I believe that the things I love and the things I don't love are what make life worth living. I think this is a great quote from a famous person
: "There are two kinds of things, the things we love and the things we don't love. We've come on deck at the very thing at
 hand." I think that's really the meaning of life. I believe that the things we love, the things we don't love, and the things we hate are two things, and I think it means a

It's not crucial to get this quant running, as I can just use the imatrix iq3 xxs one. I just thought I'd post it here, if someone else runs into similar problems.

Just use i1-IQ3_XXS for now and maybe in the future once uploaded in a few days switch to i1-IQ3_XS which will be better than IQ3-XS anyways.

As for why people confuse it with imatrix quants, I think it's because they were introduced at the same time, by the same person and also specifically to support his imatrix quantisations. In fact, they originally were pretty much "imatrix quants" and only a bit later also supported without imatrix quantisation (similarly, the Q* quants were originally not imatrix weighted and only a bit later received support for imatrix "weighting").

At least, that's how I remember it. I was also confused at first at the relationship between IQ-quants and imatrix weighting. And in fact, my use of "weighted" seems wrong, too. I think the root cause is lack of clear documentation and/or clear statements from llama.cpp side for just about anything.

Very confusing, this.

Just use i1-IQ3_XXS for now and maybe in the future once uploaded in a few days switch to i1-IQ3_XS which will be better than IQ3-XS anyways.

Using the i1-IQ3_XS was the plan all along, just wasn't available yet.
Thank you for looking into it and (somewhat) confirming the issue, though!

As for why people confuse it with imatrix quants [...] I think the root cause is lack of clear documentation

I just didn't "realize" there was an I-quant method and just assumed imatrix as soon as I saw an "I" in the filename. I knew of K-quants, though... So, confusion via lack of knowledge on my part. Whatever, I'm just rambling now. Have a good one, and thanks again!

Sign up or log in to comment