File size: 659 Bytes
74f8bad 9768c6c e065fd7 9768c6c e065fd7 9768c6c 8f79ba9 3a5fe36 8f79ba9 3a5fe36 8f79ba9 3a5fe36 cdd8bfb 0e9fc27 3a5fe36 0e9fc27 3a5fe36 0e9fc27 3a5fe36 0e9fc27 3a5fe36 994e63d cdd8bfb 3a5fe36 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
license: apache-2.0
---
**Experimental quants of 4 expert MoE mixtrals in various GGUF formats.**
**Goal is to have the best performing MoE < 10gb**
Experimental q8 and q4 files for training/finetuning too.
***No sparsity tricks yet.***
8.4gb custom 2bit quant works ok up until 512 token length then starts looping.
- Install llama.cpp from github and run it:
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make -j
wget https://huggingface.co./nisten/quad-mixtrals-gguf/resolve/main/4mixq2.gguf
./server -m 4mixq2.gguf --host "my.internal.ip.or.my.cloud.host.name.goes.here.com" -c 512
```
limit output to 500 tokens |