nisten
/

quad-mixtrals-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Dec 23, 2023

Commit

e065fd7

•

1 Parent(s): 0e9fc27

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -2,17 +2,17 @@
 license: apache-2.0
 ---
-** Experimental quants of 4 expert MoE mixtrals in various GGUF formats. **
-** Goal is to have the best performing MoE < 10gb **
 They still need training/finetuning.
-* * No sparsity tricks yet. * *
 8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
-Install llama.cpp from github and run it
 ```bash

 license: apache-2.0
 ---
+**Experimental quants of 4 expert MoE mixtrals in various GGUF formats.**
+**Goal is to have the best performing MoE < 10gb**
 They still need training/finetuning.
+* *No sparsity tricks yet.* *
 8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
+Install llama.cpp from github and run it:
 ```bash