SandLogicTechnologies
/

DeepSeek-R1-Distill-Llama-8B-GGUF

Inference Endpoints

Model card Files Files and versions Community

DeepSeek-R1-Distill-Llama-8B-GGUF / README.md

SandLogicTechnologies's picture

SandLogicTechnologies

Create README.md

ea280b8 verified 17 days ago

|

history blame contribute delete

2.62 kB

	---
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
	tags:
	- Llama
	- EdgeAI
	---
	# DeepSeek-R1-Distill-Llama-8B Quantized Models

	This repository contains Q4_KM and Q5_KM quantized versions of the [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co./deepseek-ai/DeepSeek-R1-Distill-Llama-8B) model, optimized for efficient deployment while maintaining strong performance.

	Discover our full range of quantized language models by visiting our [SandLogic Lexicon HuggingFace](https://huggingface.co./SandLogicTechnologies). To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com/).
	## Model Description

	These models are quantized versions of DeepSeek-R1-Distill-Llama-8B, which is a distilled 8B parameter model based on the Llama architecture. The original model demonstrates that reasoning patterns from larger models can be effectively distilled into smaller architectures.

	### Available Quantized Versions

	1. Q4_KM Version
	- 4-bit quantization using the K-means method
	- Approximately 4GB model size
	- Optimal balance between model size and performance
	- Recommended for resource-constrained environments

	2. Q5_KM Version
	- 5-bit quantization using the K-means method
	- Approximately 5GB model size
	- Higher precision than Q4 while maintaining significant size reduction
	- Recommended when higher accuracy is needed


	## Usage


	```bash
	pip install llama-cpp-python
	```
	Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support.

	### Basic Text Completion
	Here's an example demonstrating how to use the high-level API for basic text completion:

	```bash
	from llama_cpp import Llama

	llm = Llama(
	model_path="model/path/",
	verbose=False,
	# n_gpu_layers=-1, # Uncomment to use GPU acceleration
	# n_ctx=2048, # Uncomment to increase the context window
	)

	output = llm(
	"Q: Name the planets in the solar system? A: ", # Prompt
	max_tokens=32, # Generate up to 32 tokens
	stop=["Q:", "\n"], # Stop generating just before a new question
	echo=False # Don't echo the prompt in the output
	)

	print(output["choices"][0]["text"])
	```

	## License

	This model inherits the license of the original DeepSeek-R1-Distill-Llama-8B model. Please refer to the original model's license for usage terms and conditions.

	## Acknowledgments

	We thank the DeepSeek AI team for open-sourcing their distilled models and demonstrating that smaller models can achieve impressive performance through effective distillation techniques.