Various models in GGUF format quantized with a new 2-bit approach. Intended for use with llama.cpp. Requires llama.cpp PR 4773.

Update: PR 4773 has been merged into llama.cpp, but I have added new models that require PR 4856. The new models are those that have around 2.3-2.4 bpw. They have a lower quantization error at the xpense of being ~10% larger.

Downloads last month
414
GGUF
Model size
13B params
Architecture
llama
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.