nintwentydo commited on
Commit
48b792d
·
verified ·
1 Parent(s): 155b2a3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: nintwentydo/Razorback-12B-v0.2
3
+ base_model_relation: quantized
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+ - multimodal
9
+ - mistral
10
+ - pixtral
11
+ language:
12
+ - en
13
+ - fr
14
+ - de
15
+ - es
16
+ - it
17
+ - pt
18
+ - ru
19
+ - zh
20
+ - ja
21
+ license: other
22
+ pipeline_tag: image-text-to-text
23
+ ---
24
+
25
+ # Razorback 12B v0.2 ExLlamaV2 6.0bpw Quant
26
+ #### UnslopNemo with Vision!
27
+
28
+ <img src="https://huggingface.co/nintwentydo/Razorback-12B-v0.1/resolve/main/razorback.jpg" style="width: 100%; max-width:700px"></img>
29
+
30
+ A more robust attempt at merging TheDrummer's UnslopNemo v3 into Pixtral 12B.
31
+
32
+ Has been really stable in my testing so far. Needs more testing to see what samplers it does/doesn't like.
33
+
34
+ Seems to be the best of both worlds - less sloppy, more engaging content and decent intelligence / visual understanding.
35
+
36
+
37
+ ## Merging Approach
38
+ First, I loaded up Pixtral 12B Base and Mistral Nemo Base to compare their parameter differences.
39
+ Looking at the L2 norm / relative difference values I was able to isolate which parts of Pixtral 12B are a significant deviation from Mistral Nemo.
40
+ Because while the language model architecture is the same between the two, a lot of vision understanding has been trained into Pixtral's language model and can break very easily.
41
+
42
+ Then I calculated merging weights for each parameter using an exponential falloff. The smaller the difference, the higher the weight.
43
+
44
+ Applied this recipe to Pixtral Instruct (Pixtral-12B-2409) and TheDrummer's UnslopNemo-12B-v3. The goal is to infuse as much Drummer goodness without breaking vision input. And it looks like it's worked!
45
+
46
+
47
+ ## Usage
48
+ Needs more testing to identify best sampling params, but so far just using ~0.7 temp + 0.03 min p has been rock solid.
49
+
50
+ Use the included chat template (Mistral). No chatml support yet.
51
+
52
+ ## Credits
53
+ - Mistral for [mistralai/Pixtral-12B-2409](https://huggingface.co/mistralai/Pixtral-12B-2409)
54
+ - Unsloth for [unsloth/Pixtral-12B-2409](https://huggingface.co/unsloth/Pixtral-12B-2409) transformers conversion
55
+ - TheDrummer for [TheDrummer/UnslopNemo-12B-v3](https://huggingface.co/TheDrummer/UnslopNemo-12B-v3)