nisten
/

quad-mixtrals-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Dec 23, 2023

Commit

8f79ba9

•

1 Parent(s): 994e63d

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -6,13 +6,13 @@ license: apache-2.0
 **Goal is to have the best performing MoE < 10gb**
->Experimental q8 and q4 files for training/finetuning too.
-* *No sparsity tricks yet.*
 8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
-Install llama.cpp from github and run it:
 ```bash

 **Goal is to have the best performing MoE < 10gb**
+Experimental q8 and q4 files for training/finetuning too.
+***No sparsity tricks yet.***
 8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
+- Install llama.cpp from github and run it:
 ```bash