NeoChen1024 commited on
Commit
c31b76a
1 Parent(s): 9859416

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -3,7 +3,11 @@ base_model:
3
  - CausalLM/35b-beta-long
4
  ---
5
  # GGUF quants of CausalLM/35b-beta-long, here I have:
6
- * IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
7
- * IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
8
- * Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
9
- * Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)
 
 
 
 
 
3
  - CausalLM/35b-beta-long
4
  ---
5
  # GGUF quants of CausalLM/35b-beta-long, here I have:
6
+ ```
7
+ IQ2_M (10.1401 +/- 0.14062, fits into 24GiB VRAM + 24576 context with q4_1 KV cache, also room for 2048 ubatch)
8
+ IQ4_XS ( 9.4489 +/- 0.13005, fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
9
+ IQ4_NL ( 9.4632 +/- 0.13056, fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
10
+ Q4_K_M ( 9.3738 +/- 0.12900, fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
11
+ Q8_0 ( 9.3277 +/- 0.12781, probably isn't practical for anything unless you have big GPU array, imatrix derived from it)
12
+ ```
13
+ Perplexity measured with `-fa -ctv q4_1 -ctk q4_1 -c 2048 -ub 2048` on UTF-8 text version of ["Wired Love" from Project Gutenberg](http://www.gutenberg.org/ebooks/24353).