Converted version of CodeLlama-34b to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.

Impact on performance

In the following figure, we can see the impact on the performance of a set of models relative to the required RAM space. It is noticeable that the quantized models have equivalent performance while providing a significant gain in RAM usage.

constellation

Downloads last month
15
Safetensors
Model size
17.7B params
Tensor type
F32
·
BF16
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including cmarkea/CodeLlama-34b-hf-4bit