model invents words and acts weirdly
I am aware this model acts weird and invents new words, just making this as a issue so people don't bring it up and I remember. Might be fixable, not sure
When enlarging a model, usually passthrough is used. It's similar method to how solar was made.
I don't know the difference but maybe try it again using passthrough?
https://huggingface.co./froggeric/WestLake-10.7B-v2
You can see an example here
And you should be able to stack a number of models as long as the final product is 48 layers it will be 11b/10.7b
Just a possible reason why it might be making words, because the layers may have be mangled?
That explains a lot actually thanks, I wasn't able to directly stack the layers e.g. [0,32] with this approach without it going zodiac killer even with just two models which confused me as others could. Will retry with that and see what happens tomorrow, maybe with more parameters too if it goes well, but likely will do two versions :3
Just made a merge with it, seems promising so far and a meaningful improvement. I will test it more though and likely upload it afterwards, it still needs some fixing to get rid of weirdness though but much less so
New version! Somewhat of a sidegrade, but mostly better
https://huggingface.co./nonetrix/pippafeet-11B-0.2
Sidegrades are common when upscaling models. It usually won't meaningfully increase performance but it can be useful as something to train on.
Pretty much every model upscale I've seen has lost benchmark points but perform the same if not slightly better in reality, generally resulting in a model that is better at writing and slightly less smart.
I'll try 0.2 soon, it's easier now that there's ggufs :3