Update README.md
Browse files
README.md
CHANGED
@@ -17,11 +17,11 @@ Please be sure to set experts per token to 4 for the best results! Context lengt
|
|
17 |
# Quanitized versions
|
18 |
|
19 |
EXL2 (for fast GPU-only inference): <br />
|
20 |
-
8_0bpw:
|
21 |
6_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-6_0bpw (for GPU's with 20+ GB of vram) <br />
|
22 |
5_0bpw: [coming soon] <br />
|
23 |
4_25bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-4_25bpw (for GPU's with 16+ GB of vram) <br />
|
24 |
-
3_5bpw:
|
25 |
3_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_0bpw (for GPU's with 12+ GB of vram)
|
26 |
|
27 |
GGUF (for mixed GPU+CPU inference or CPU-only inference): <br />
|
|
|
17 |
# Quanitized versions
|
18 |
|
19 |
EXL2 (for fast GPU-only inference): <br />
|
20 |
+
8_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-8_0bpw <br />
|
21 |
6_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-6_0bpw (for GPU's with 20+ GB of vram) <br />
|
22 |
5_0bpw: [coming soon] <br />
|
23 |
4_25bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-4_25bpw (for GPU's with 16+ GB of vram) <br />
|
24 |
+
3_5bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_5bpw <br />
|
25 |
3_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_0bpw (for GPU's with 12+ GB of vram)
|
26 |
|
27 |
GGUF (for mixed GPU+CPU inference or CPU-only inference): <br />
|