Lewdiculous commited on
Commit
683132a
1 Parent(s): b1d5d9c

att --quantkv mention

Browse files
Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -1,9 +1,43 @@
1
  ---
 
 
 
2
  license: cc-by-nc-4.0
 
 
 
 
 
 
 
3
  ---
4
- Quants for [Sao10K/Llama-3.1-8B-Stheno-v3.4](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4).
5
 
6
- ### Original model card:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  ![img](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4/resolve/main/meneno.jpg)
9
 
@@ -66,4 +100,6 @@ Below are some graphs and all for you to observe.
66
  Have a good one.
67
 
68
  ```
69
- Source Image: https://www.pixiv.net/en/artworks/91689070
 
 
 
1
  ---
2
+ base_model: Sao10K/Llama-3.1-8B-Stheno-v3.4
3
+ quantized_by: Lewdiculous
4
+ library_name: transformers
5
  license: cc-by-nc-4.0
6
+ inference: false
7
+ language:
8
+ - en
9
+ tags:
10
+ - roleplay
11
+ - llama3
12
+ - sillytavern
13
  ---
14
+ Quants for [**Sao10K/Llama-3.1-8B-Stheno-v3.4**](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4).
15
 
16
+ I recommend checking their page for feedback and support.
17
+
18
+ > [!IMPORTANT]
19
+ > **Quantization process:** <br>
20
+ > Imatrix data was generated from the FP16-GGUF and conversions directly from the BF16-GGUF. <br>
21
+ > This hopefully avoids losses during conversion. <br>
22
+ > To run this model, please use the [**latest version of KoboldCpp**](https://github.com/LostRuins/koboldcpp/releases/latest). <br>
23
+ > If you noticed any issues let me know in the discussions.
24
+
25
+ > [!NOTE]
26
+ > **Presets:** <br>
27
+ > Some compatible SillyTavern presets can be found [**here (Virt's Roleplay Presets - v1.9)**](https://huggingface.co/Virt-io/SillyTavern-Presets). <br>
28
+ > Check [**discussions such as this one**](https://huggingface.co/Virt-io/SillyTavern-Presets/discussions/5#664d6fb87c563d4d95151baa) and [**this one**](https://www.reddit.com/r/SillyTavernAI/comments/1dff2tl/my_personal_llama3_stheno_presets/) for other presets and samplers recommendations. <br>
29
+ > Lower temperatures are recommended by the authors, so make sure to experiment. <br>
30
+ >
31
+ > **General usage with KoboldCpp:** <br>
32
+ > For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes without the use of `--quantkv`. <br>
33
+ > Using `--quantkv 1` (≈Q8) or even `--quantkv 2` (≈Q4) can get you to 32K context sizes with the caveat of not being compatible with Context Shifting, only relevant if you can manage to fill up that much context. <br>
34
+ > [**Read more about it in the release here**](https://github.com/LostRuins/koboldcpp/releases/tag/v1.67).
35
+
36
+
37
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/GV63jjNPXvSG-BSOGuP5h.png)
38
+
39
+ <details>
40
+ <summary>Click here for the original model card information.</summary>
41
 
42
  ![img](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4/resolve/main/meneno.jpg)
43
 
 
100
  Have a good one.
101
 
102
  ```
103
+ Source Image: https://www.pixiv.net/en/artworks/91689070
104
+
105
+ </details>