metadata
license: mit
pipeline_tag: text-generation
tags:
- code
- deepseek_v3
- qwen
- int4
- conversational
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Qwen-32B-AWQ wint4
Distillation of DeepSeek-R1 to Qwen 32B, quantized using AWQ to wint4. It fits on any 24GB VRAM GPU or 32GB URAM device!
MMLU-PRO
The MMLU-PRO dataset evaluates subjects across 14 distinct fields using a 5-shot accuracy measurement. Each task assesses models following the methodology of the original MMLU implementation, with each having ten possible choices.
Measure
- Accuracy: Evaluated as "exact_match"
Shots
- Shots: 5-shot
Tasks
Tasks | Filter | n-shot | Metric | Value | Stderr |
---|---|---|---|---|---|
mmlu_pro | custom-extract | exact_match | 0.5875 | 0.0044 | |
biology | custom-extract | 5 | exact_match | 0.7978 | 0.0150 |
business | custom-extract | 5 | exact_match | 0.5982 | 0.0175 |
chemistry | custom-extract | 5 | exact_match | 0.4691 | 0.0148 |
computer_science | custom-extract | 5 | exact_match | 0.6122 | 0.0241 |
economics | custom-extract | 5 | exact_match | 0.7346 | 0.0152 |
engineering | custom-extract | 5 | exact_match | 0.3891 | 0.0157 |
health | custom-extract | 5 | exact_match | 0.6345 | 0.0168 |
history | custom-extract | 5 | exact_match | 0.6168 | 0.0249 |
law | custom-extract | 5 | exact_match | 0.4596 | 0.0150 |
math | custom-extract | 5 | exact_match | 0.6425 | 0.0130 |
other | custom-extract | 5 | exact_match | 0.6223 | 0.0160 |
philosophy | custom-extract | 5 | exact_match | 0.5731 | 0.0222 |
physics | custom-extract | 5 | exact_match | 0.5073 | 0.0139 |
psychology | custom-extract | 5 | exact_match | 0.7494 | 0.0154 |
Groups
Groups | Filter | n-shot | Metric | Value | Stderr |
---|---|---|---|---|---|
mmlu_pro | custom-extract | exact_match | 0.5875 | 0.0044 |