Converted version of CodeLlama-70b to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.

Impact on performance

In the following figure, we can see the impact on the performance of a set of models relative to the required RAM space. It is noticeable that the quantized models have equivalent performance while providing a significant gain in RAM usage.

Downloads last month: 9

Safetensors

Model size

35.8B params

Tensor type

F32

BF16

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including cmarkea/CodeLlama-70b-hf-4bit

Quantized 4-bit models

Collection

Large model quantized with post-quantization performance very close to the original models, allowing it to run on reasonable infrastructure. • 10 items • Updated 10 days ago • 1