Converted version of Qwen2.5-Coder-7B-Instruct to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.

Impact on performance

Impact of quantization on a set of models.

We evaluated the models using the PoLL (Pool of LLM) technique a panel of giga-models (GPT-4o, Gemini Pro 1.5, and Claude-Sonnet 3.5). The scoring ranged from 0, indicating a model unsuitable for the task, to 5, representing a model that fully met expectations. The evaluation was based on 67 instructions across four programming languages: Python, Java, JavaScript, and Pseudo-code. All tests were conducted in a French-language context, and models were heavily penalized if they responded in another language, even if the response was technically correct.

Performance Scores (on a scale of 5):

Model Score # params (Billion) size (GB)
gemini-1.5-pro 4.51 NA NA
gpt-4o 4.51 NA NA
claude3.5-sonnet 4.49 NA NA
Qwen/Qwen2.5-Coder-32B-Instruct 4.41 32.8 65.6
Qwen/Qwen2.5-32B-Instruct 4.40 32.8 65.6
cmarkea/Qwen2.5-32B-Instruct-4bit 4.36 32.8 16.4
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct 4.24 15.7 31.4
meta-llama/Meta-Llama-3.1-70B-Instruct 4.23 70.06 141.2
cmarkea/Meta-Llama-3.1-70B-Instruct-4bit 4.14 70.06 35.3
Qwen/Qwen2.5-Coder-7B-Instruct 4.11 7.62 15.24
cmarkea/Qwen2.5-Coder-7B-Instruct-4bit 4.08 7.62 3.81
cmarkea/Mixtral-8x7B-Instruct-v0.1-4bit 3.8 46.7 23.35
meta-llama/Meta-Llama-3.1-8B-Instruct 3.73 8.03 16.06
mistralai/Mixtral-8x7B-Instruct-v0.1 3.33 46.7 93.4
codellama/CodeLlama-13b-Instruct-hf 3.33 13 26
codellama/CodeLlama-34b-Instruct-hf 3.27 33.7 67.4
codellama/CodeLlama-7b-Instruct-hf 3.19 6.74 13.48
cmarkea/CodeLlama-34b-Instruct-hf-4bit 3.12 33.7 16.35
codellama/CodeLlama-70b-Instruct-hf 1.82 69 138
cmarkea/CodeLlama-70b-Instruct-hf-4bit 1.64 69 34.5

The impact of quantization is negligible.

Prompt Pattern

Here is a reminder of the command pattern to interact with the model:

<|im_start|>user\n{user_prompt_1}<|im_end|><|im_start|>assistant\n{model_answer_1}<|im_end|>...
Downloads last month
36
Safetensors
Model size
4.46B params
Tensor type
BF16
·
F32
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including cmarkea/Qwen2.5-Coder-7B-Instruct-4bit