Edit model card

This is a GPTQ 4bits version of Auto-J-13B. We convert it using this script (by TheBroke).

To use the 4bits version of Auto-J, you need to install the following packages:

pip install safetensors
pip install transformers>=4.32.0 optimum>=1.12.0
pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # Use cu117 if on CUDA 11.7

It takes about 8GB VRAM to load this model, and we provide an example for using it in example_gptq4bits.py.

Note that the behaviours of the quantized model and the original one might be different.

Please refer to our github repo for more datails.

Downloads last month
14
Safetensors
Model size
2.03B params
Tensor type
F32
I32
FP16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.