Quantized GGUF or exl2?
Hi @Undi95 ,
Thank you for uploading this. Do you know any quantized versions of dbrx instruct? I has been two days since the release but there are still no quantized GGUF or exl2 versions for this model. I have 36GB VRAM and 96GB RAM. It would be interesting to run the model locally (e.g. Q3_K_M or exl2 with 3.0 bpw ).
Thanks!
https://github.com/ggerganov/llama.cpp/issues/6344
Llama.cpp don't support it yet, so no GGUF.
I don't think exllama support it either, sadly.
that is very unfortunate. Looking forward to 4-bit quantization to run it locally.
Turboderp recently added support for DBRX! Exllamav2 should now work with DBRX:
https://github.com/turboderp/exllamav2/issues/388#issuecomment-2027971687
Turbo also uploaded a bunch of quants for both base and instruct models:
https://huggingface.co./turboderp/dbrx-base-exl2
https://huggingface.co./turboderp/dbrx-instruct-exl2
Turboderp recently added support for DBRX! Exllamav2 should now work with DBRX:
https://github.com/turboderp/exllamav2/issues/388#issuecomment-2027971687Turbo also uploaded a bunch of quants for both base and instruct models:
https://huggingface.co./turboderp/dbrx-base-exl2
https://huggingface.co./turboderp/dbrx-instruct-exl2
Thanks for the head up