nisten commited on
Commit
3a5fe36
1 Parent(s): 8b281e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -2,8 +2,29 @@
2
  license: apache-2.0
3
  ---
4
 
5
- Experimental quants of 4 headed mixtrals in various GGUF formats.
6
 
7
- Goal is to have the best performing MoE < 16 Gig .
8
 
9
- They still need training/finetuning
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ Experimental quants of 4 expert MoE mixtrals in various GGUF formats.
6
 
7
+ Goal is to have the best performing MoE < 10gb .
8
 
9
+ They still need training/finetuning.
10
+
11
+ No sparsity tricks yet.
12
+
13
+ 8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
14
+
15
+ Install llama.cpp from github
16
+
17
+
18
+ ```
19
+ git clone https://github.com/ggerganov/llama.cpp
20
+
21
+ cd llama.cpp
22
+
23
+ make -j
24
+
25
+ wget https://huggingface.co/nisten/quad-mixtrals-gguf/resolve/main/4mixq2.gguf
26
+
27
+ ./server -m 4mixq2.gguf --host "ec2-3-99-206-122.ca-central-1.compute.amazonaws.com" -c 512```
28
+
29
+
30
+ limit output to 500 tokens