Update README.md
Browse files
README.md
CHANGED
@@ -16,12 +16,9 @@ Please be sure to set experts per token to 4 for the best results! Context lengt
|
|
16 |
|
17 |
# Quanitized versions
|
18 |
|
19 |
-
EXL2 (for fast GPU-only inference):
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
4_25bpw: [coming soon] (for GPU's with 16+ GB of vram)
|
24 |
-
|
25 |
3_0bpw: [coming soon] (for GPU's with 12+ GB of vram)
|
26 |
|
27 |
GGUF (for mixed GPU+CPU inference or CPU-only inference):
|
|
|
16 |
|
17 |
# Quanitized versions
|
18 |
|
19 |
+
EXL2 (for fast GPU-only inference): <br />
|
20 |
+
6_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-6_0bpw (for GPU's with 20+ GB of vram) <br />
|
21 |
+
4_25bpw: [coming soon] (for GPU's with 16+ GB of vram) <br />
|
|
|
|
|
|
|
22 |
3_0bpw: [coming soon] (for GPU's with 12+ GB of vram)
|
23 |
|
24 |
GGUF (for mixed GPU+CPU inference or CPU-only inference):
|