Great Job! Running on 4090 with 19968 Context length.

#1
by Lucifael - opened

Really great model! Is there currently a sweet spot for the 4090? I use oogabooga.

For me the best (recent) models for 24GB VRAM cards (3090/4090) are 22/24B Q6/6-6.8bpw (Mistral) or 32B Q4/4-4.5bpw (Qwen2.5 or CMD-R), you might also fit a 39B Skyfall at IQ4_XS quant. In my experience the new 70B models (Llama3.x) are quite bad at low quants needed to fit into 24GB VRAM - they are slower and often worse than higher quant of smaller model. For me Miqu was the last 70B model that was still decent at Q2/2-2.5bpw - I tried mostly Midnight-Miqu, Dark-Miqu and Nimbus-Miqu, but now Miqu is quite old and and modern smaller models have caught up it.

Sign up or log in to comment