Edit model card

Phi-3.5-Mini-instruct-quantized-autoround-asym-4bit

Model Description:

Phi-3.5-Mini-instruct-quantized-autoround-asym-4bit is a 4-bit asymmetrically quantized version of the Microsoft Phi-3.5 Mini Instruct model. The original model was quantized using the GPTQ (Generative Pre-trained Transformer Quantization) method and asymmetric quantization, reducing its size from 7.6 GB to 2.28 GB.

Intended Use:

This quantized model can be used for various natural language processing tasks, such as text generation, language translation, and question answering. Its reduced size allows for deployment on devices with limited memory, such as GPUs with less than 8 GB of VRAM.

Limitations:

*The quantization process may result in some loss of precision compared to the original model. *The model's performance may be slightly lower than the full-precision version. *The model may not be suitable for tasks requiring high precision or exact numerical computations.

Training Procedure:

The quantization process was performed using the AutoGPTQ library and the GPTQ algorithm. The model was quantized to 4-bit precision using asymmetric quantization with automatic rounding.

Evaluation:

The model's performance was evaluated before and after quantization using the perplexity metric. The evaluation process was symmetric, using the same metric and procedure for both the original and quantized models.

Quantization Configuration:

Quantization method: GPTQ (Generative Pre-trained Transformer Quantization) Bits: 4 Symmetric quantization: False(asymmetric quantization used).

Hardware Requirements:

The quantized model can be run on GPUs with less than 8 GB of memory, thanks to its reduced size of 2.28 GB.

License:

The model is released under the Apache License 2.0.

Contact:

For any questions or feedback, please contact the model creator, Satwik, on LinkedIn or via email.

Downloads last month
37
Safetensors
Model size
684M params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Satwik11/Phi-3.5-Mini-instruct-quantized-autoround-asym-4bit

Quantized
(86)
this model