hdnh2006 commited on
Commit
c36a045
·
verified ·
1 Parent(s): 5c11e00

Update README.md with model information and quantization details

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
4
+ tags:
5
+ - deepseek
6
+ - llama.cpp
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+ quantized_by: hdnh2006
10
+ ---
11
+
12
+ # DeepSeek-R1-Distill-Qwen-1.5B GGUF llama.cpp quantization by [Henry Navarro](https://henrynavarro.org) 🧠🤖
13
+
14
+
15
+ This repository contains GGUF format model files for DeepSeek-R1-Distill-Qwen-1.5B, quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp).
16
+
17
+ All the models have been quantized following the [instructions](https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md#quantize) provided by llama.cpp. This is:
18
+ ```bash
19
+ # obtain the official LLaMA model weights and place them in ./models
20
+ ls ./models
21
+ llama-2-7b tokenizer_checklist.chk tokenizer.model
22
+ # [Optional] for models using BPE tokenizers
23
+ ls ./models
24
+ <folder containing weights and tokenizer json> vocab.json
25
+ # [Optional] for PyTorch .bin models like Mistral-7B
26
+ ls ./models
27
+ <folder containing weights and tokenizer json>
28
+
29
+ # install Python dependencies
30
+ python3 -m pip install -r requirements.txt
31
+
32
+ # convert the model to ggml FP16 format
33
+ python3 convert_hf_to_gguf.py models/mymodel/
34
+
35
+ # quantize the model to 4-bits (using Q4_K_M method)
36
+ ./llama-quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
37
+
38
+ # update the gguf filetype to current version if older version is now unsupported
39
+ ./llama-quantize ./models/mymodel/ggml-model-Q4_K_M.gguf ./models/mymodel/ggml-model-Q4_K_M-v2.gguf COPY
40
+ ```
41
+
42
+
43
+ ## Model Details
44
+
45
+ Original model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
46
+
47
+ ## Summary models 📋
48
+ | Filename | Quant type | Description |
49
+ | -------- | ---------- | ----------- |
50
+ | [DeepSeek-R1-Distill-Qwen-1.5B-F16.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-F16.gguf) | F16 | Half precision, no quantization applied |
51
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q8_0.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q8_0.gguf) | Q8_0 | 8-bit quantization, highest quality, largest size |
52
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q6_K.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q6_K.gguf) | Q6_K | 6-bit quantization, very high quality |
53
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q5_1.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q5_1.gguf) | Q5_1 | 5-bit quantization, good balance of quality and size |
54
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf) | Q5_K_M | 5-bit quantization, good balance of quality and size |
55
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_S.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_S.gguf) | Q5_K_S | 5-bit quantization, good balance of quality and size |
56
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q5_0.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q5_0.gguf) | Q5_0 | 5-bit quantization, good balance of quality and size |
57
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q4_1.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q4_1.gguf) | Q4_1 | 4-bit quantization, balanced quality and size |
58
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf) | Q4_K_M | 4-bit quantization, balanced quality and size |
59
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_S.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_S.gguf) | Q4_K_S | 4-bit quantization, balanced quality and size |
60
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf) | Q4_0 | 4-bit quantization, balanced quality and size |
61
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_L.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_L.gguf) | Q3_K_L | 3-bit quantization, smaller size, lower quality |
62
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_M.gguf) | Q3_K_M | 3-bit quantization, smaller size, lower quality |
63
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_S.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q3_K_S.gguf) | Q3_K_S | 3-bit quantization, smaller size, lower quality |
64
+ | [DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf](https://huggingface.co/hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/DeepSeek-R1-Distill-Qwen-1.5B-Q2_K.gguf) | Q2_K | 2-bit quantization, smallest size, lowest quality |
65
+
66
+
67
+ ## Usage with Ollama 🦙
68
+
69
+ ### Direct from Ollama
70
+ ```
71
+ ollama run hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B
72
+ ```
73
+
74
+ ## Download Models Using huggingface-cli 🤗
75
+
76
+ ### Installation of `huggingface_hub[cli]`
77
+ ```bash
78
+ pip install -U "huggingface_hub[cli]"
79
+ ```
80
+
81
+ ### Downloading Specific Model Files
82
+ ```bash
83
+ huggingface-cli download hdnh2006/DeepSeek-R1-Distill-Qwen-1.5B --include "DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf" --local-dir ./
84
+ ```
85
+
86
+ ## Which File Should I Choose? 📈
87
+
88
+ A comprehensive analysis with performance charts is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9).
89
+
90
+ ### Assessing System Capabilities
91
+ 1. **Determine Your Model Size**: Start by checking the amount of RAM and VRAM available in your system. This will help you decide the largest possible model you can run.
92
+ 2. **Optimizing for Speed**:
93
+ - **GPU Utilization**: To run your model as quickly as possible, aim to fit the entire model into your GPU's VRAM. Pick a version that’s 1-2GB smaller than the total VRAM.
94
+ 3. **Maximizing Quality**:
95
+ - **Combined Memory**: For the highest possible quality, sum your system RAM and GPU's VRAM. Then choose a model that's 1-2GB smaller than this combined total.
96
+
97
+ ### Deciding Between 'I-Quant' and 'K-Quant'
98
+ 1. **Simplicity**:
99
+ - **K-Quant**: If you prefer a straightforward approach, select a K-quant model. These are labeled as 'QX_K_X', such as Q5_K_M.
100
+ 2. **Advanced Configuration**:
101
+ - **Feature Chart**: For a more nuanced choice, refer to the [llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix).
102
+ - **I-Quant Models**: Best suited for configurations below Q4 and for systems running cuBLAS (Nvidia) or rocBLAS (AMD). These are labeled 'IQX_X', such as IQ3_M, and offer better performance for their size.
103
+ - **Compatibility Considerations**:
104
+ - **I-Quant Models**: While usable on CPU and Apple Metal, they perform slower compared to their K-quant counterparts. The choice between speed and performance becomes a significant tradeoff.
105
+ - **AMD Cards**: Verify if you are using the rocBLAS build or the Vulkan build. I-quants are not compatible with Vulkan.
106
+ - **Current Support**: At the time of writing, LM Studio offers a preview with ROCm support, and other inference engines provide specific ROCm builds.
107
+
108
+ By following these guidelines, you can make an informed decision on which file best suits your system and performance needs.
109
+
110
+
111
+ ## Contact 🌐
112
+ Website: henrynavarro.org
113
+
114