AutoGPTQ-INT4-gs128
Collection
A collection of models quantized in AutoGPTQ format using Intel AutoRound, INT4, groupsize 128
•
102 items
•
Updated
Quantized version of tiiuae/Falcon3-3B-Instruct using torch.float32 for quantization tuning.
Quantization framework: Intel AutoRound v0.4.4
Note: this INT4 version of Falcon3-3B-Instruct has been quantized to run inference through CPU.
I suggest to install requirements into a dedicated python-virtualenv or a conda enviroment.
wget https://github.com/intel/auto-round/archive/refs/tags/v0.4.4.tar.gz
tar -xvzf v0.4.4.tar.gz
cd auto-round-0.4.4
pip install -r requirements-cpu.txt --upgrade
pip install -vvv --no-build-isolation -e .[cpu]
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "tiiuae/Falcon3-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
from auto_round import AutoRound
bits, group_size, sym, device, amp = 4, 128, False, 'cpu', False
autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym, device=device, amp=amp)
autoround.quantize()
output_dir = "./AutoRound/tiiuae_Falcon3-3B-Instruct-autogptq-int4-gs128-asym"
autoround.save_quantized(output_dir, format='auto_gptq', inplace=True)
This quantized model comes with no warranty. It has been developed only for research purposes.
Unable to build the model tree, the base model loops to the model itself. Learn more.