nisten
/

quad-mixtrals-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Dec 23, 2023

Commit

0e9fc27

•

1 Parent(s): cdd8bfb

Update README.md

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -2,29 +2,29 @@
 license: apache-2.0
 ---
-#Experimental quants of 4 expert MoE mixtrals in various GGUF formats.
-##Goal is to have the best performing MoE < 10gb .
 They still need training/finetuning.
-####No sparsity tricks yet.
 8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
-Install llama.cpp from github
 ```bash
-`git clone https://github.com/ggerganov/llama.cpp`
-`cd llama.cpp`
-`make -j `
-`wget https://huggingface.co/nisten/quad-mixtrals-gguf/resolve/main/4mixq2.gguf`
-`./server -m 4mixq2.gguf --host "ec2-3-99-206-122.ca-central-1.compute.amazonaws.com" -c 512`
 ```

 license: apache-2.0
 ---
+** Experimental quants of 4 expert MoE mixtrals in various GGUF formats. **
+** Goal is to have the best performing MoE < 10gb **
 They still need training/finetuning.
+* * No sparsity tricks yet. * *
 8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
+Install llama.cpp from github and run it
 ```bash
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+make -j
+wget https://huggingface.co/nisten/quad-mixtrals-gguf/resolve/main/4mixq2.gguf
+./server -m 4mixq2.gguf --host "ec2-3-99-206-122.ca-central-1.compute.amazonaws.com" -c 512
 ```