--- language: - en base_model: - deepseek-ai/DeepSeek-R1-Distill-Llama-8B tags: - Llama - EdgeAI --- # DeepSeek-R1-Distill-Llama-8B Quantized Models This repository contains Q4_KM and Q5_KM quantized versions of the [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co./deepseek-ai/DeepSeek-R1-Distill-Llama-8B) model, optimized for efficient deployment while maintaining strong performance. Discover our full range of quantized language models by visiting our [SandLogic Lexicon HuggingFace](https://huggingface.co./SandLogicTechnologies). To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com/). ## Model Description These models are quantized versions of DeepSeek-R1-Distill-Llama-8B, which is a distilled 8B parameter model based on the Llama architecture. The original model demonstrates that reasoning patterns from larger models can be effectively distilled into smaller architectures. ### Available Quantized Versions 1. **Q4_KM Version** - 4-bit quantization using the K-means method - Approximately 4GB model size - Optimal balance between model size and performance - Recommended for resource-constrained environments 2. **Q5_KM Version** - 5-bit quantization using the K-means method - Approximately 5GB model size - Higher precision than Q4 while maintaining significant size reduction - Recommended when higher accuracy is needed ## Usage ```bash pip install llama-cpp-python ``` Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support. ### Basic Text Completion Here's an example demonstrating how to use the high-level API for basic text completion: ```bash from llama_cpp import Llama llm = Llama( model_path="model/path/", verbose=False, # n_gpu_layers=-1, # Uncomment to use GPU acceleration # n_ctx=2048, # Uncomment to increase the context window ) output = llm( "Q: Name the planets in the solar system? A: ", # Prompt max_tokens=32, # Generate up to 32 tokens stop=["Q:", "\n"], # Stop generating just before a new question echo=False # Don't echo the prompt in the output ) print(output["choices"][0]["text"]) ``` ## License This model inherits the license of the original DeepSeek-R1-Distill-Llama-8B model. Please refer to the original model's license for usage terms and conditions. ## Acknowledgments We thank the DeepSeek AI team for open-sourcing their distilled models and demonstrating that smaller models can achieve impressive performance through effective distillation techniques.