blocky blocky blocky

by mclassHF2023 - opened Jun 6

mclassHF2023

Jun 6

This is probably not the GGUF's or anyone's fault, but I run into this "blocky blocky blocky" issue on oobabooga and can't test the unquantized model.
It seems to run in LM Studio, so I assume that oobabooga just has to update something. Just wanted to know if others are also running into this, and if so I can suggest LM Studio for now.

bartowski

Owner Jun 7

Probably a lack of update but also I think you need to avoid CUDA offloading for now

mclassHF2023

Jun 7

I didn't do any offloading to CPU, if you mean that?
I tested out an exl2 quantization at 4bpw and that worked perfectly. So I think it's probably something related to Text Generation WebUI and a missing update of a library (llama.cpp or something).

bartowski

Owner Jun 8

No you want to do no offloading to GPU, aka leave it all on your CPU

It can appear as a bug in exl2 as well but not sure why it doesn't always appear

You can also enable flash attention for llamacpp which should be able to work around the issue

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment