nintwentydo
commited on
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: nintwentydo/Razorback-12B-v0.2
|
3 |
+
base_model_relation: quantized
|
4 |
+
library_name: transformers
|
5 |
+
tags:
|
6 |
+
- mergekit
|
7 |
+
- merge
|
8 |
+
- multimodal
|
9 |
+
- mistral
|
10 |
+
- pixtral
|
11 |
+
language:
|
12 |
+
- en
|
13 |
+
- fr
|
14 |
+
- de
|
15 |
+
- es
|
16 |
+
- it
|
17 |
+
- pt
|
18 |
+
- ru
|
19 |
+
- zh
|
20 |
+
- ja
|
21 |
+
license: other
|
22 |
+
pipeline_tag: image-text-to-text
|
23 |
+
---
|
24 |
+
|
25 |
+
# Razorback 12B v0.2 ExLlamaV2 6.0bpw Quant
|
26 |
+
#### UnslopNemo with Vision!
|
27 |
+
|
28 |
+
<img src="https://huggingface.co/nintwentydo/Razorback-12B-v0.1/resolve/main/razorback.jpg" style="width: 100%; max-width:700px"></img>
|
29 |
+
|
30 |
+
A more robust attempt at merging TheDrummer's UnslopNemo v3 into Pixtral 12B.
|
31 |
+
|
32 |
+
Has been really stable in my testing so far. Needs more testing to see what samplers it does/doesn't like.
|
33 |
+
|
34 |
+
Seems to be the best of both worlds - less sloppy, more engaging content and decent intelligence / visual understanding.
|
35 |
+
|
36 |
+
|
37 |
+
## Merging Approach
|
38 |
+
First, I loaded up Pixtral 12B Base and Mistral Nemo Base to compare their parameter differences.
|
39 |
+
Looking at the L2 norm / relative difference values I was able to isolate which parts of Pixtral 12B are a significant deviation from Mistral Nemo.
|
40 |
+
Because while the language model architecture is the same between the two, a lot of vision understanding has been trained into Pixtral's language model and can break very easily.
|
41 |
+
|
42 |
+
Then I calculated merging weights for each parameter using an exponential falloff. The smaller the difference, the higher the weight.
|
43 |
+
|
44 |
+
Applied this recipe to Pixtral Instruct (Pixtral-12B-2409) and TheDrummer's UnslopNemo-12B-v3. The goal is to infuse as much Drummer goodness without breaking vision input. And it looks like it's worked!
|
45 |
+
|
46 |
+
|
47 |
+
## Usage
|
48 |
+
Needs more testing to identify best sampling params, but so far just using ~0.7 temp + 0.03 min p has been rock solid.
|
49 |
+
|
50 |
+
Use the included chat template (Mistral). No chatml support yet.
|
51 |
+
|
52 |
+
## Credits
|
53 |
+
- Mistral for [mistralai/Pixtral-12B-2409](https://huggingface.co/mistralai/Pixtral-12B-2409)
|
54 |
+
- Unsloth for [unsloth/Pixtral-12B-2409](https://huggingface.co/unsloth/Pixtral-12B-2409) transformers conversion
|
55 |
+
- TheDrummer for [TheDrummer/UnslopNemo-12B-v3](https://huggingface.co/TheDrummer/UnslopNemo-12B-v3)
|