Update README.md
Browse files
README.md
CHANGED
@@ -17,12 +17,12 @@ Please be sure to set experts per token to 4 for the best results! Context lengt
|
|
17 |
# Quanitized versions
|
18 |
|
19 |
EXL2 (for fast GPU-only inference): <br />
|
20 |
-
8_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-8_0bpw (
|
21 |
-
6_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-6_0bpw (
|
22 |
-
5_0bpw: [coming soon] (
|
23 |
-
4_25bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-4_25bpw (
|
24 |
-
3_5bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_5bpw (
|
25 |
-
3_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_0bpw (
|
26 |
|
27 |
GGUF (for mixed GPU+CPU inference or CPU-only inference): <br />
|
28 |
https://huggingface.co/mradermacher/WizardLM-2-4x7B-MoE-GGUF <br />
|
|
|
17 |
# Quanitized versions
|
18 |
|
19 |
EXL2 (for fast GPU-only inference): <br />
|
20 |
+
8_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-8_0bpw (25+ GB vram) <br />
|
21 |
+
6_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-6_0bpw (20+ GB vram) <br />
|
22 |
+
5_0bpw: [coming soon] (16+ GB vram) <br />
|
23 |
+
4_25bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-4_25bpw (14+ GB vram) <br />
|
24 |
+
3_5bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_5bpw (12+ GB vram) <br />
|
25 |
+
3_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_0bpw (11+ GB vram)
|
26 |
|
27 |
GGUF (for mixed GPU+CPU inference or CPU-only inference): <br />
|
28 |
https://huggingface.co/mradermacher/WizardLM-2-4x7B-MoE-GGUF <br />
|