Lewdiculous
commited on
Commit
•
683132a
1
Parent(s):
b1d5d9c
att --quantkv mention
Browse files
README.md
CHANGED
@@ -1,9 +1,43 @@
|
|
1 |
---
|
|
|
|
|
|
|
2 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
-
Quants for [Sao10K/Llama-3.1-8B-Stheno-v3.4](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4).
|
5 |
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
![img](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4/resolve/main/meneno.jpg)
|
9 |
|
@@ -66,4 +100,6 @@ Below are some graphs and all for you to observe.
|
|
66 |
Have a good one.
|
67 |
|
68 |
```
|
69 |
-
Source Image: https://www.pixiv.net/en/artworks/91689070
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model: Sao10K/Llama-3.1-8B-Stheno-v3.4
|
3 |
+
quantized_by: Lewdiculous
|
4 |
+
library_name: transformers
|
5 |
license: cc-by-nc-4.0
|
6 |
+
inference: false
|
7 |
+
language:
|
8 |
+
- en
|
9 |
+
tags:
|
10 |
+
- roleplay
|
11 |
+
- llama3
|
12 |
+
- sillytavern
|
13 |
---
|
14 |
+
Quants for [**Sao10K/Llama-3.1-8B-Stheno-v3.4**](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4).
|
15 |
|
16 |
+
I recommend checking their page for feedback and support.
|
17 |
+
|
18 |
+
> [!IMPORTANT]
|
19 |
+
> **Quantization process:** <br>
|
20 |
+
> Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. <br>
|
21 |
+
> This hopefully avoids losses during conversion. <br>
|
22 |
+
> To run this model, please use the [**latest version of KoboldCpp**](https://github.com/LostRuins/koboldcpp/releases/latest). <br>
|
23 |
+
> If you noticed any issues let me know in the discussions.
|
24 |
+
|
25 |
+
> [!NOTE]
|
26 |
+
> **Presets:** <br>
|
27 |
+
> Some compatible SillyTavern presets can be found [**here (Virt's Roleplay Presets - v1.9)**](https://huggingface.co/Virt-io/SillyTavern-Presets). <br>
|
28 |
+
> Check [**discussions such as this one**](https://huggingface.co/Virt-io/SillyTavern-Presets/discussions/5#664d6fb87c563d4d95151baa) and [**this one**](https://www.reddit.com/r/SillyTavernAI/comments/1dff2tl/my_personal_llama3_stheno_presets/) for other presets and samplers recommendations. <br>
|
29 |
+
> Lower temperatures are recommended by the authors, so make sure to experiment. <br>
|
30 |
+
>
|
31 |
+
> **General usage with KoboldCpp:** <br>
|
32 |
+
> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes without the use of `--quantkv`. <br>
|
33 |
+
> Using `--quantkv 1` (≈Q8) or even `--quantkv 2` (≈Q4) can get you to 32K context sizes with the caveat of not being compatible with Context Shifting, only relevant if you can manage to fill up that much context. <br>
|
34 |
+
> [**Read more about it in the release here**](https://github.com/LostRuins/koboldcpp/releases/tag/v1.67).
|
35 |
+
|
36 |
+
|
37 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/GV63jjNPXvSG-BSOGuP5h.png)
|
38 |
+
|
39 |
+
<details>
|
40 |
+
<summary>Click here for the original model card information.</summary>
|
41 |
|
42 |
![img](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4/resolve/main/meneno.jpg)
|
43 |
|
|
|
100 |
Have a good one.
|
101 |
|
102 |
```
|
103 |
+
Source Image: https://www.pixiv.net/en/artworks/91689070
|
104 |
+
|
105 |
+
</details>
|