GAIR/autoj-13b-GPTQ-4bits

This is a GPTQ 4bits version of Auto-J-13B. We convert it using this script (by TheBroke).

To use the 4bits version of Auto-J, you need to install the following packages:

pip install safetensors
pip install transformers>=4.32.0 optimum>=1.12.0
pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # Use cu117 if on CUDA 11.7

It takes about 8GB VRAM to load this model, and we provide an example for using it in example_gptq4bits.py.

Note that the behaviours of the quantized model and the original one might be different.

Please refer to our github repo for more datails.