metadata

license: mit
pipeline_tag: text-generation
tags:
  - code
  - deepseek_v3
  - qwen
  - int4
  - conversational
base_model:
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill-Qwen-32B-AWQ wint4

Distillation of DeepSeek-R1 to Qwen 32B, quantized using AWQ to wint4. It fits on any 24GB VRAM GPU or 32GB URAM device!

MMLU-PRO

The MMLU-PRO dataset evaluates subjects across 14 distinct fields using a 5-shot accuracy measurement. Each task assesses models following the methodology of the original MMLU implementation, with each having ten possible choices.

Measure

Accuracy: Evaluated as "exact_match"

Shots

Shots: 5-shot

Tasks

Tasks	Filter	n-shot	Metric	Value	Stderr
mmlu_pro	custom-extract		exact_match	0.5875	0.0044
biology	custom-extract	5	exact_match	0.7978	0.0150
business	custom-extract	5	exact_match	0.5982	0.0175
chemistry	custom-extract	5	exact_match	0.4691	0.0148
computer_science	custom-extract	5	exact_match	0.6122	0.0241
economics	custom-extract	5	exact_match	0.7346	0.0152
engineering	custom-extract	5	exact_match	0.3891	0.0157
health	custom-extract	5	exact_match	0.6345	0.0168
history	custom-extract	5	exact_match	0.6168	0.0249
law	custom-extract	5	exact_match	0.4596	0.0150
math	custom-extract	5	exact_match	0.6425	0.0130
other	custom-extract	5	exact_match	0.6223	0.0160
philosophy	custom-extract	5	exact_match	0.5731	0.0222
physics	custom-extract	5	exact_match	0.5073	0.0139
psychology	custom-extract	5	exact_match	0.7494	0.0154

Groups

Groups	Filter	n-shot	Metric	Value	Stderr
mmlu_pro	custom-extract		exact_match	0.5875	0.0044