TheBloke/guanaco-33B-GGML · Works with 16 GB RAM, 8 GB VRAM, BUT...

Hey, today I was excited to download the latest version of KoboldCpp v1.30.1 which apparently fully supports new k-quantizers on Clblast now. Yay!

I downloaded one of my favorite models, this time one level higher than what I would usually download just to test the ability of working on the edge of the hw possibilities. For this test I chose the smallest version of Guanaco-33B-GGML: guanaco-33B.ggmlv3.q2_K.bin.

Initially I was struggling to make it run! The model loaded all the time no problem, BUT it kept crashing, giving me errors about running out of memory. 😑
I ended up, lowering blast cache from my usual 1024 down to 512 and I tried again that way. Surprise surprise! This time it worked! 🥳

So much for the good news. Now for the bad news... It's super slow!!! I guess if there was a volunteer with Guanaco-65B on Kobold Lite, you would get your output there much faster and probably in higher quality too, but at least it works locally on the aforementioned hardware. I guess this is good news for those who don't care about speed. 😃