This is a GPTQ 4bits version of Auto-J-13B. We convert it using this script (by TheBroke).
To use the 4bits version of Auto-J, you need to install the following packages:
pip install safetensors
pip install transformers>=4.32.0 optimum>=1.12.0
pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7
It takes about 8GB VRAM to load this model, and we provide an example for using it in example_gptq4bits.py.
Note that the behaviours of the quantized model and the original one might be different.
Please refer to our github repo for more datails.
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.