fimbulvntr
commited on
Commit
•
e557c95
1
Parent(s):
df8511d
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Original model: https://huggingface.co/brucethemoose/Yi-34B-200K-RPMerge
|
2 |
+
|
3 |
+
Steps:
|
4 |
+
1. Convert to GGUF using llama.cpp (clone from source, install requirements, then run this)
|
5 |
+
> `python convert.py /mnt/d/LLM_Models/Yi-34B-200K-RPMerge/ --vocab-type hfft --outtype f32 --outfile Yi-34B-200K-RPMerge.gguf`
|
6 |
+
2. Create imatrix (offload as much as you can to the GPU)
|
7 |
+
> `./imatrix -m /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.gguf -f /mnt/d/LLM_Models/8k_random_data.txt -o /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.imatrix.dat -ngl 20`
|
8 |
+
3. Quantize using imatrix
|
9 |
+
> `./quantize --imatrix /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.imatrix.dat /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.gguf /mnt/d/LLM_Models/Yi-34B-200K-RPMerge.IQ2_XXS.gguf IQ2_XXS
|
10 |
+
|
11 |
+
I have also uploaded [8k_random_data.txt from this github discussion](https://github.com/ggerganov/llama.cpp/discussions/5006)
|
12 |
+
And the importance matrix I made (`Yi-34B-200K-RPMerge.imatrix.dat`)
|