StableLM 2 Zephyr 1.6B
Model Details
Model Name: Zephyr 1.6B (GGUF Format)
Quantization Options:
F16
(16-bit float)Q8_0
(8-bit integer)
This repository hosts quantized versions of the StabilityAI Zephyr 1.6B
model for efficient inference using the llama.cpp
library. The quantized models have been optimized for both performance and memory usage, making them suitable for a variety of platforms, including constrained hardware setups like ARM64 and low-memory x86 machines.
Core Libraries
- Core Library: llama.cpp
- Model Format: GGUF (f16 and q8)
- Original Model Source: stabilityai/stablelm-2-zephyr-1_6b
The original Zephyr 1.6B
model has been adapted for llama.cpp
with gguf
quantization, providing seamless integration with a wide range of inference tools.
Quantized Model Files
Format | File Name | Size | Description |
---|---|---|---|
F16 | ggml-model-f16.gguf |
~3.2 GB | 16-bit float precision for balanced speed and accuracy |
Q8_0 | ggml-model-q8_0.gguf |
~1.8 GB | 8-bit integer precision for reduced memory usage |
The F16
format provides high precision and is ideal for scenarios where maintaining output quality is crucial. The Q8_0
format significantly reduces the model size, making it suitable for deployments where memory is a key constraint.
Hardware Recommendations
Format | Minimum RAM | Recommended GPU |
---|---|---|
F16 | 8 GB | 16 GB (with offloading) |
Q8_0 | 4 GB | 8 GB (CPU only recommended) |
For optimal performance, it is recommended to use GPU offloading for the F16
format. The Q8_0
variant works well on CPUs with low RAM requirements.
Usage Example
You can use the following commands to run the Zephyr 1.6B models using llama.cpp
:
Running with f16
Format
./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Write a Python function that computes the Fibonacci sequence."
Running with q8_0
Format
./main -m ggml-model-q8_0.gguf --n-predict -1 --prompt "Explain the concept of machine learning in simple terms."
GPU Offloading (for f16
Models)
To leverage GPU offloading for the f16
model, you can use the following command:
./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Summarize the impact of quantum computing on cryptography." --n-gpu-layers 32
Safety and Responsible Use
The Zephyr 1.6B model has been trained using a combination of instruction-tuning and synthetic data to enhance safety and ensure coherent responses. However, as with any large language model, there may be scenarios where outputs are not fully aligned with user expectations. It is recommended to always supervise outputs, particularly when used in sensitive applications.
For more details, refer to the original model card.
License
The quantized models in this repository are released under the CC-BY-NC-SA-4.0
license.
For details, see the license file.
Citation
If you use the Zephyr 1.6B models in your research or applications, please cite the original authors:
@article{stabilitylm2023,
title={StableLM 2: Zephyr 1.6B},
author={Stability AI},
year={2023}
}
- Downloads last month
- 3
Model tree for teleprint-me/stablelm-2-zephyr-1_6b
Base model
stabilityai/stablelm-2-zephyr-1_6b