---
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
tags:
- Llama
- EdgeAI
---
# DeepSeek-R1-Distill-Llama-8B Quantized Models

This repository contains Q4_KM and Q5_KM quantized versions of the [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co./deepseek-ai/DeepSeek-R1-Distill-Llama-8B) model, optimized for efficient deployment while maintaining strong performance.

Discover our full range of quantized language models by visiting our [SandLogic Lexicon HuggingFace](https://huggingface.co./SandLogicTechnologies). To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com/).
## Model Description

These models are quantized versions of DeepSeek-R1-Distill-Llama-8B, which is a distilled 8B parameter model based on the Llama architecture. The original model demonstrates that reasoning patterns from larger models can be effectively distilled into smaller architectures.

### Available Quantized Versions

1. **Q4_KM Version**
   - 4-bit quantization using the K-means method
   - Approximately 4GB model size
   - Optimal balance between model size and performance
   - Recommended for resource-constrained environments

2. **Q5_KM Version**
   - 5-bit quantization using the K-means method
   - Approximately 5GB model size
   - Higher precision than Q4 while maintaining significant size reduction
   - Recommended when higher accuracy is needed


## Usage


```bash
pip install llama-cpp-python 
```
Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support.

### Basic Text Completion
Here's an example demonstrating how to use the high-level API for basic text completion:

```bash
from llama_cpp import Llama

llm = Llama(
    model_path="model/path/",
    verbose=False,
    # n_gpu_layers=-1, # Uncomment to use GPU acceleration
    # n_ctx=2048, # Uncomment to increase the context window
)

output = llm(
    "Q: Name the planets in the solar system? A: ", # Prompt
    max_tokens=32, # Generate up to 32 tokens
    stop=["Q:", "\n"], # Stop generating just before a new question
    echo=False # Don't echo the prompt in the output
)

print(output["choices"][0]["text"])
```

## License

This model inherits the license of the original DeepSeek-R1-Distill-Llama-8B model. Please refer to the original model's license for usage terms and conditions.

## Acknowledgments

We thank the DeepSeek AI team for open-sourcing their distilled models and demonstrating that smaller models can achieve impressive performance through effective distillation techniques.