SandLogicTechnologies's picture
Create README.md
ea280b8 verified
---
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
tags:
- Llama
- EdgeAI
---
# DeepSeek-R1-Distill-Llama-8B Quantized Models
This repository contains Q4_KM and Q5_KM quantized versions of the [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co./deepseek-ai/DeepSeek-R1-Distill-Llama-8B) model, optimized for efficient deployment while maintaining strong performance.
Discover our full range of quantized language models by visiting our [SandLogic Lexicon HuggingFace](https://huggingface.co./SandLogicTechnologies). To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com/).
## Model Description
These models are quantized versions of DeepSeek-R1-Distill-Llama-8B, which is a distilled 8B parameter model based on the Llama architecture. The original model demonstrates that reasoning patterns from larger models can be effectively distilled into smaller architectures.
### Available Quantized Versions
1. **Q4_KM Version**
- 4-bit quantization using the K-means method
- Approximately 4GB model size
- Optimal balance between model size and performance
- Recommended for resource-constrained environments
2. **Q5_KM Version**
- 5-bit quantization using the K-means method
- Approximately 5GB model size
- Higher precision than Q4 while maintaining significant size reduction
- Recommended when higher accuracy is needed
## Usage
```bash
pip install llama-cpp-python
```
Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support.
### Basic Text Completion
Here's an example demonstrating how to use the high-level API for basic text completion:
```bash
from llama_cpp import Llama
llm = Llama(
model_path="model/path/",
verbose=False,
# n_gpu_layers=-1, # Uncomment to use GPU acceleration
# n_ctx=2048, # Uncomment to increase the context window
)
output = llm(
"Q: Name the planets in the solar system? A: ", # Prompt
max_tokens=32, # Generate up to 32 tokens
stop=["Q:", "\n"], # Stop generating just before a new question
echo=False # Don't echo the prompt in the output
)
print(output["choices"][0]["text"])
```
## License
This model inherits the license of the original DeepSeek-R1-Distill-Llama-8B model. Please refer to the original model's license for usage terms and conditions.
## Acknowledgments
We thank the DeepSeek AI team for open-sourcing their distilled models and demonstrating that smaller models can achieve impressive performance through effective distillation techniques.