|
--- |
|
language: |
|
- en |
|
base_model: |
|
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
|
tags: |
|
- Llama |
|
- EdgeAI |
|
--- |
|
# DeepSeek-R1-Distill-Llama-8B Quantized Models |
|
|
|
This repository contains Q4_KM and Q5_KM quantized versions of the [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co./deepseek-ai/DeepSeek-R1-Distill-Llama-8B) model, optimized for efficient deployment while maintaining strong performance. |
|
|
|
Discover our full range of quantized language models by visiting our [SandLogic Lexicon HuggingFace](https://huggingface.co./SandLogicTechnologies). To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com/). |
|
## Model Description |
|
|
|
These models are quantized versions of DeepSeek-R1-Distill-Llama-8B, which is a distilled 8B parameter model based on the Llama architecture. The original model demonstrates that reasoning patterns from larger models can be effectively distilled into smaller architectures. |
|
|
|
### Available Quantized Versions |
|
|
|
1. **Q4_KM Version** |
|
- 4-bit quantization using the K-means method |
|
- Approximately 4GB model size |
|
- Optimal balance between model size and performance |
|
- Recommended for resource-constrained environments |
|
|
|
2. **Q5_KM Version** |
|
- 5-bit quantization using the K-means method |
|
- Approximately 5GB model size |
|
- Higher precision than Q4 while maintaining significant size reduction |
|
- Recommended when higher accuracy is needed |
|
|
|
|
|
## Usage |
|
|
|
|
|
```bash |
|
pip install llama-cpp-python |
|
``` |
|
Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support. |
|
|
|
### Basic Text Completion |
|
Here's an example demonstrating how to use the high-level API for basic text completion: |
|
|
|
```bash |
|
from llama_cpp import Llama |
|
|
|
llm = Llama( |
|
model_path="model/path/", |
|
verbose=False, |
|
# n_gpu_layers=-1, # Uncomment to use GPU acceleration |
|
# n_ctx=2048, # Uncomment to increase the context window |
|
) |
|
|
|
output = llm( |
|
"Q: Name the planets in the solar system? A: ", # Prompt |
|
max_tokens=32, # Generate up to 32 tokens |
|
stop=["Q:", "\n"], # Stop generating just before a new question |
|
echo=False # Don't echo the prompt in the output |
|
) |
|
|
|
print(output["choices"][0]["text"]) |
|
``` |
|
|
|
## License |
|
|
|
This model inherits the license of the original DeepSeek-R1-Distill-Llama-8B model. Please refer to the original model's license for usage terms and conditions. |
|
|
|
## Acknowledgments |
|
|
|
We thank the DeepSeek AI team for open-sourcing their distilled models and demonstrating that smaller models can achieve impressive performance through effective distillation techniques. |
|
|
|
|