nisten commited on
Commit
cdd8bfb
1 Parent(s): 3a5fe36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -2,29 +2,30 @@
2
  license: apache-2.0
3
  ---
4
 
5
- Experimental quants of 4 expert MoE mixtrals in various GGUF formats.
6
 
7
- Goal is to have the best performing MoE < 10gb .
8
 
9
  They still need training/finetuning.
10
 
11
- No sparsity tricks yet.
12
 
13
  8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
14
 
15
  Install llama.cpp from github
16
 
17
 
18
- ```
19
- git clone https://github.com/ggerganov/llama.cpp
20
 
21
- cd llama.cpp
22
 
23
- make -j
24
 
25
- wget https://huggingface.co/nisten/quad-mixtrals-gguf/resolve/main/4mixq2.gguf
26
 
27
- ./server -m 4mixq2.gguf --host "ec2-3-99-206-122.ca-central-1.compute.amazonaws.com" -c 512```
 
28
 
29
 
30
  limit output to 500 tokens
 
2
  license: apache-2.0
3
  ---
4
 
5
+ #Experimental quants of 4 expert MoE mixtrals in various GGUF formats.
6
 
7
+ ##Goal is to have the best performing MoE < 10gb .
8
 
9
  They still need training/finetuning.
10
 
11
+ ####No sparsity tricks yet.
12
 
13
  8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
14
 
15
  Install llama.cpp from github
16
 
17
 
18
+ ```bash
19
+ `git clone https://github.com/ggerganov/llama.cpp`
20
 
21
+ `cd llama.cpp`
22
 
23
+ `make -j `
24
 
25
+ `wget https://huggingface.co/nisten/quad-mixtrals-gguf/resolve/main/4mixq2.gguf`
26
 
27
+ `./server -m 4mixq2.gguf --host "ec2-3-99-206-122.ca-central-1.compute.amazonaws.com" -c 512`
28
+ ```
29
 
30
 
31
  limit output to 500 tokens