DeusImperator/Midnight-Miqu-70B-v1.5_exl2_2.4bpw_rpcal · Great Job! Running on 4090 with 19968 Context length.

For me the best (recent) models for 24GB VRAM cards (3090/4090) are 22/24B Q6/6-6.8bpw (Mistral) or 32B Q4/4-4.5bpw (Qwen2.5 or CMD-R), you might also fit a 39B Skyfall at IQ4_XS quant. In my experience the new 70B models (Llama3.x) are quite bad at low quants needed to fit into 24GB VRAM - they are slower and often worse than higher quant of smaller model. For me Miqu was the last 70B model that was still decent at Q2/2-2.5bpw - I tried mostly Midnight-Miqu, Dark-Miqu and Nimbus-Miqu, but now Miqu is quite old and and modern smaller models have caught up it.