Commit
·
86709bd
1
Parent(s):
b65c13a
Update README.md
Browse files
README.md
CHANGED
@@ -11,9 +11,15 @@ This is a second prototype of SuperHOT, this time with 4K context and no RLHF. I
|
|
11 |
- 13B 8K CUDA (no groupsize): [tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ)
|
12 |
- 13B 8K CUDA 32g: [tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ)
|
13 |
|
14 |
-
|
|
|
15 |
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
The patch is very simple, and you can make the changes yourself:
|
19 |
- Increase the `max_position_embeddings` to 8192 to stretch the sinusoidal
|
|
|
11 |
- 13B 8K CUDA (no groupsize): [tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-GPTQ)
|
12 |
- 13B 8K CUDA 32g: [tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ](https://huggingface.co/tmpupload/superhot-13b-8k-no-rlhf-test-32g-GPTQ)
|
13 |
|
14 |
+
#### Using the monkey-patch?
|
15 |
+
You will **NEED** to **apply the monkeypatch** or, if you are already using the monkeypatch, **change the scaling factor to 0.25 and the maximum sequence length to 8192**
|
16 |
|
17 |
+
#### Using Oobabooga with Exllama?
|
18 |
+
- `python server.py --max_seq_len 8192 --compress_pos_emb 4 --loader exllama_hf`
|
19 |
+
|
20 |
+
In order to use the 8K context, you will need to apply the monkeypatch I have added in this repo or follow the instructions for oobabooga's text-generation-webui -- **without it, it will not work**.
|
21 |
+
|
22 |
+
I will repeat: **Without the patch with the correct scaling and max sequence length, it will not work!**
|
23 |
|
24 |
The patch is very simple, and you can make the changes yourself:
|
25 |
- Increase the `max_position_embeddings` to 8192 to stretch the sinusoidal
|