Update README.md
Browse files
README.md
CHANGED
@@ -34,6 +34,17 @@ This is a merge of pre-trained language models created using [mergekit](https://
|
|
34 |
|eq_bench| 2.1|none | 0|eqbench |↑ | 78.7955|± |1.4668|
|
35 |
| | |none | 0|percent_parseable|↑ |100.0000|± |0.0000|
|
36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
### Merge Method
|
38 |
|
39 |
This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Lambent/threebird-scribe-alpha0.3-7B](https://huggingface.co/Lambent/threebird-scribe-alpha0.3-7B) as a base.
|
|
|
34 |
|eq_bench| 2.1|none | 0|eqbench |↑ | 78.7955|± |1.4668|
|
35 |
| | |none | 0|percent_parseable|↑ |100.0000|± |0.0000|
|
36 |
|
37 |
+
|
38 |
+
0.3 involved 3 separate tunes stock merged on overlapping datasets for long context writing, multi-turn conversation and RP, with a touch of poetry and code.
|
39 |
+
From there, each of the four threads was separately task-tuned on 2 datasets each.
|
40 |
+
Various methods of combining those via merge were tested, with this one scoring highest on EQ-Bench as an indicator.
|
41 |
+
|
42 |
+
My understanding of the Model Stock merge method is that it mitigates task adaptation to a significant degree, but also significantly limits forgetting caused by training.
|
43 |
+
I have hope that the adaptation, especially over two stages, is still sufficient to aid in longer contexts and multi-turn conversations from the ancestor models, and add some individual style while retaining a fair amount of their capability.
|
44 |
+
|
45 |
+
This model's refusals are ... not nonexistent, but certainly don't rely on them.
|
46 |
+
To my knowledge it has no particular refusal behavior for simply NSFW content, but I haven't exactly exhaustively tested which OSHA violations it will aid and abet.
|
47 |
+
|
48 |
### Merge Method
|
49 |
|
50 |
This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Lambent/threebird-scribe-alpha0.3-7B](https://huggingface.co/Lambent/threebird-scribe-alpha0.3-7B) as a base.
|